22nd юни 2017

A short description of the Morphologically Annotated Part of BulTreeBank (BulTreeBank-Morph)

This distribution represents only the morphological information encoded in BulTreeBank – HPSG-based Treebank of Bulgarian. It contains about 214000 tokens. It was used for the training of the TreeTagger for Bulgarian.

It contains sentences from Bulgarian Grammar Textbooks, Newspapers, Literature and other sources of texts.

Full documentation (Style Book, Tagset description) of the Treebank can be found in Publications menu.

Data Format

The morphological annotation is described in:


The tagset is described in:

Acquiring the Data

If you are interested in using BulTreeBank-Morph, please, fill in the user agreement form, print it, scan it and send it to Kiril Simov. If not possible to send it electronically, please, send it by regular mail to:

Kiril Simov
BulTreeBank Project
Linguistic Modelling Laboratory, IPP,
Bulgarian Academy of Sciences
Acad. G.Bonchev St. 25A
1113 Sofia, Bulgaria

After receiving the filled form we will send to you the data.


The BulTreeBank is developed under the BulTreeBank Project, which is a joint project of the Linguistic Modelling Laboratory (LML), Institute for Parallel ProcessingBulgarian Academy of Sciences and Seminar für Sprachwissenschaft (SfS), Eberhard-Karls-Universitä t, Tübingen, Germany. The project is funded by the Volkswagen Stiftung, Federal Republic of Germany under the Programme „Cooperation with Natural and Engineering Scientists in Central and Eastern Europe“.

We would like to thank our colleagues from Tübingen!