Creating annotations for historical language data – special needs and opportunities of old manuscripts

Stefanie Dipper

Bochum University

Annotating historical language data poses many interesting challenges. To name a few: How much of the characteristics of the original manuscript should be preserved by the transcription? How can we determine sentence boundaries when there are no punctuation marks? How should we deal with spelling variance? Which (morpho-)syntactic annotations could be used for annotating diachronic data, where the transition of words from one part of speech to another can be observed? Etc. At the same time, these characteristics are a rich source for linguistic investigations on language use and evolution. In my talk, I would like to address such issues, reporting on work from different historical corpus projects.