Call for Papers The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3)

In memory of father Roberto Busa (1913-2011)

12th December 2013, Sofia, Bulgaria

ARCH-3 is a co-event of The Twelfth Workshop on Treebanks and Linguistic Theories

 

Sponsors:   Ministry of Education and Science;        Ontotext AD

Submissions are invited for oral presentations and posters (with or without demonstrations) featuring high quality and previously unpublished research on the topics described below. Contributions should focus on results from completed as well as ongoing research, with an emphasis on novel approaches, methods, ideas, and perspectives, whether descriptive, theoretical, formal or computational.

Proceedings will be published in time for the workshop. The full proceedings of the previous two editions of ACRH are respectively available at http://www.jlcl.org (ACRH-1) and at http://alfclul.clul.ul.pt/crpc/acrh2/ACRH-2_FINAL.pdf (ACRH-2).

The workshop will be co-located with the Twelfth International Workshop on „Treebanks and Linguistic Theories“ (TLT-12), which will be held on December 13-14, 2013 (http://www.bultreebank.org/TLT12/).

This edition of ACRH will be dedicated to the memory of father Roberto Busa, to celebrate the 100th anniversary of his birth (November 28, 1913). ACRH-3 will devote one special session to father Busa. This session will feature one introduction and one invited talk, which will be given by the recipient of the 2013 Busa Award, Prof. Willard McCarty (King’s College, London, UK).

Motivation and Aims

Research in the Humanities is predominantly text-based. For centuries scholars have studied documents such as historical manuscripts, literary works, legal contracts, diaries of important personalities, old tax records etc. Manual analysis of such documents is still the dominant research paradigm in the Humanities. However, with the advent of the digital age this is increasingly complemented by approaches that utilise digital resources. More and more corpora are made available in digital form (theatrical plays, contemporary novels, critical literature, literary reviews etc.). This has a potentially profound impact on how research is conducted in the Humanities. Digitised sources can be searched more easily than traditional, paper-based sources, allowing scholars to analyse texts quicker and more systematically. Moreover, digital data can also be (semi-)automatically mined: important facts, trends and interdependencies can be detected, complex statistics can be calculated and the results can be visualised and presented to the scholars, who can then delve further into the data for verification and deeper analysis. Digitisation encourages empirical research, opening the road for completely new research paradigms that exploit `big data’ for humanities research. This has also given rise to Digital Humanities (or E-Humanities) as a new research area. Digitisation is only a first step, however. In their raw form, electronic corpora are of limited use to humanities researchers. The true potential of such resources is only unlocked if corpora are enriched with different layers of linguistic annotation (ranging from morphology to semantics). While corpus annotation can build on a long tradition in (corpus) linguistics and computational linguistics, corpus and computational linguistics on the one side and the Humanities on the other side have grown apart over the past decades.

The ACRH workshop aims at building a tighter collaboration between people working in various areas of the Humanities (such as literature, philology, history etc.) and the research community involved in developing, using and making accessible annotated corpora. We believe that such a collaboration is now needed because, while annotating a corpus from scratch still remains a labor-intensive and time-consuming task, today this is simplified by intensively exploiting prior experience in the field. Actually, such a interplay is still quite far from being achieved, as a gap still holds between computational linguists (who sometimes do not involve humanists in developing and exploiting annotated corpora for the Humanities) and humanists (who sometimes just ignore that such corpora do exist and that automatic methods and standards to build them are today available). Although many corpora that play a relevant role for research in Humanities are today available in digital format, only a few of them are linguistically tagged, while most still lack linguistic tagging at all. Over the past few years a number of historical annotated corpora have been started, among which are treebanks for Middle, Early Modern and Old English, Early New High German, Medieval Portuguese, Ugaritic, Latin, Ancient Greek and several translations of the New Testament into Indo-European languages. The experience of these ever-growing set of projects can provide many suggestions on the methodology as well as on the practice of interaction between literary studies, philology and corpus linguistics.

Topics

To overcome the above mentioned issues, ACRH-3 aims at covering a wide range of topics related to the annotation of corpora for research in the Humanities.
The topics to be addressed in the workshop include (but are not limited to) the following:

  • specific issues related to the annotation of corpora for research in the Humanities
  • annotated corpora as a basis for research in the Humanities
  • diachronic, historical and literary annotated corpora
  • use of annotated corpora for stylometrics and authorship attribution
  • philological issues, like different readings, textual variants, apparatus, non-standard orthography and spelling variation
  • annotation principles and schemes of corpora for research in the Humanities
  • adaptation of NLP tools for older language varieties
  • integration of annotated corpora for the Humanities into language resources infrastructures
  • tools for building and accessing annotated corpora for the Humanities
  • examples of fruitful collaboration between Computational Linguistics and Humanities in building and exploiting annotated corpora

Invited Speaker

Willard McCarty (King’s College, London, UK)

Important Dates

Deadlines: always midnight, UTC (‘Coordinated Universal Time’), ignoring DST (‘Daylight Saving Time’):
– Deadline for paper submission: 22nd September 2013
– Notification of acceptance: 25th October 2013
– Final version of paper: 17th November 2013
– Workshop: 12th December 2013

Instructions for Submission

We invite to submit full papers describing original, unpublished research related to the topics of the workshop. Papers should not exceed 12 pages. The language of the workshop is English. All papers must be submitted in well-checked English. Papers should be submitted in PDF format only. Submissions have to be made via the EasyChair page of the workshop at https://www.easychair.org/conferences/?conf=acrh3. Please, first register at EasyChair if you do not have an EasyChair account. The style guidelines follow the specifications required by TLT.

Please, note that as reviewing will be double-blind, the papers should not include the authors’ names and affiliations or any references to web-sites, project names etc. revealing the authors’ identity. Furthermore, any self-reference should be avoided. For instance, instead of „We previously showed (Brown, 2001)…“, use citations such as „Brown previously showed (Brown, 2001)…“. Each submitted paper will be reviewed by three members of the program committee.

Submitted papers can be for oral or poster presentations (with or without demo). There is no difference between the different kinds of presentation both in terms of reviewing process and publication in the proceedings (the limit of 12 pages holds for both oral and poster presentations).

Oral Presentation

The oral presentations at the workshop will be 30 minutes long (25 minutes for presentation and 5 minutes for questions and discussion).

Program Committee Chair

  • Francesco Mambrini, Deutsches Archдologisches Institut, Berlin, Germany
  • Marco Passarotti, Universitа Cattolica del Sacro Cuore, Milan, Italy
  • Caroline Sporleder, University of Trier, Germany

Program Committee

  • Stefanie Dipper, Germany
  • Voula Giouli, Greece
  • Iris Hendrickx, Portugal
  • Erhard Hinrichs, Germany
  • Cerstin Mahlow, Switzerland
  • Alexander Mehler, Germany
  • Jirí Mírovský, Czech Republic
  • Christian-Emil Smith Ore, Norway
  • Michael Piotrowski, Germany
  • Paul Rayson, UK
  • Martin Reynaert, The Netherlands
  • Jeff Rydberg Cox, USA
  • Kiril Simov, Bulgaria
  • Stefan Sinclair, Canada
  • Mark Steedman, UK
  • Frank Van Eynde, Belgium
  • Martin Wynne, UK

Contact Information

For information on this workshop please contact marco.passarotti at unicatt.it

Local Organization Committee

  • Petya Osenova (Sofia University)
  • Kiril Simov (IICT-BAS)
  • Stanislava Kancheva (Sofia University)
  • Georgi Georgiev (Ontotext)
  • Borislav Popov (Ontotext)