|
|
BulTreeBank GroupThe BulTreeBank Group is working on projects related to Computational Linguistics and Semantic Web. Our main task is to create Language Resources and Tools for Bulgarian. We have worked on creation of a Bulgarian Treebank, POS tagger, Partial Grammar, Text Archive, Domain Ontologies, Lexicons, XML Tools. The BulTreeBank Group is part of the Linguistic Modelling Laboratory (LML), Institute of Information and Communication Technologies , Bulgarian Academy of Sciences. The group originates from BulTreeBank Project. The project was funded by the Volkswagen Stiftung, Federal Republic of Germany under the Programme "Cooperation with Natural and Engineering Scientists in Central and Eastern Europe". The project was carried out mainly at LML in tight cooperation with researchers at the Seminar für Sprachwissenschaft (SfS), Eberhard-Karls-Universitä t, Tübingen, Germany. The core members of BulTreeBank Group
WebCLaRK – Bulgarian Portal for Language Services on the webCurrent ProjectsWe are involved in the following projects and initiatives:
Past ProjectsWe were involved in the following projects:
КнигиПетя Осенова, Кирил Симов. Формална граматика на българския език. Институт по паралелна обработка на информацията - БАН. София, 18. 12. 2007 г. (Formal Grammar of Bulgarian Language. IPP, BAS.) Here is a draft of Petya's habilitation (in Bulgarian). Any comments are welcome. Bulgarian Noun Phrases in HPSG. - Summary in English Това е вариант на хабилитационния труд на Петя Осенова. Всякакви коментари са добре дошли. Именните фрази в българския език (с оглед на Опорната фразова граматика). CLaRK system - XML-based system for corpora developmentThe core of CLaRK is an XML Editor, which is the main interface to the system. Besides the XML language itself, we implemented an XPath language for navigation in documents and an XSLT language for transformation of XML documents. CLaRK is based on an Unicode encoding of the information inside the system. The basic mechanism of CLaRK for linguistic processing of text corpora is the cascaded regular grammar processor. Several mechanisms for imposing constraints over XML documents are available. The constraints cannot be stated by the standard XML technology. Technical ReportsAvailable Language ResourcesThe dependency format of the treebank, morphosyntactically annotated corpus, stopwords for Bulgarian, frequency list, etc. Курсове по Линукс (Linux).ContactsKiril Simov |