12th April 2017

CLaRK System

CLaRK – an XML Based System For Corpora Development

Unicode XML Editor, XPath Engine, XSLT Engine, XML Constraints, XML Cascaded Regular Grammar Engine.

CLaRK is an XML-based software system for corpora development implemented in JAVA. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources. It incorporates several technologies:

  • XML technology;
  • Unicode;
  • Regular Cascaded Grammars;
  • Constraints over XML Documents.


Bulgarian NLP pipeline in CLaRK System (BTB-Pipe)

Bulgarian National Reference Corpus BulTreeBank


CLaRK System version 3.0 is available here.

Download Clark 3.0