Available Resources
Text Acknowledgements
Related links


CLaRK System

CLaRK System Online Manual

Bulgarian dialects'
electronic archive

eXTReMe Tracker








Menu Options


Because of the variety of graphical characters (letters) which the Unicode tables allow, it is necessary for the user to have a means for keyboard input. Unfortunately, in most cases either the keys on the keyboard are not enough or the already defined keyboards are not suitable.

In these cases the CLaRK System suggests the following solution. The user can define his/her own keyboard maps, i.e. for each key on the keyboard a different character can be attached. There are 94 keys available for mapping. For identification of each key, its ASCII character is used (which coincides with the beginning of the Unicode). It is a default for the specific machine architecture. The keyboard maps are represented as sets of pairs. Each pair is responsible for one key. It has two elements: the default character and the code of the new attached character from the Unicode table. And when a newly defined keyboard is activated and some key is pressed, its character is searched for in the set of char-code pairs. If there is such a pair found, then the second element is taken and according to it a new character is retrieved from the Unicode table and is visualized on the screen. If there is no such a pair, then the same character appears on the screen.

There are two keyboards default for the system - English (the hardware system keyboard) and Bulgarian Phonetic (auxiliary). Both are fixed and cannot be modified.

When the system works there are always two active keyboards. The two keyboards can be switched on alternatively by the key combination <Ctrl>+<Left Shift>. Also there is an indicator on the toolbar, which shows the currently used keyboard. If the indicator is red colored and the sign is Aux, it means that the auxiliary one is in use. Otherwise it is green colored with a sign Lat. A switch can be performed also by clicking on the indicator.

Keyboard Editor

When this item is selected, the Keyboard Manager window appears. The initial view of the manager is presented below:

The manager dialog window contains three subparts: Keyboard Preview, Unicode Table Preview and Control Panel. A keyboard for editing can be selected from Current Keyboard at the right top of the dialog.

Keyboard Preview

This is the table on the left side of the window (with the white background). It shows the current state of the auxiliary keyboard. Each row in it represents a pair from the keyboard map. The first column contains the characters of the hardware default keyboard. It is not editable. The second column contains the codes of the newly attached characters. The third column is a char preview which shows the new char for the selected key corresponding to the current code. In the picture above the selection is set to a row with a character d. The character code attached to it is 1076, which means that when the user presses d, on the screen will not appear d, but the character corresponding to this code.

The user can define a key by entering the codes of the desired characters. After entering a code, <Enter> is expected.

Unicode Table Preview

Now the question is how the user will know the code of the expected character. The answer comes from the second component - Unicode Table Preview. This is the table with the blue background in the picture. It contains the characters of the Unicode table available for the current font. This font is identical with the font of the text area in the system. If the character, expected by the user, is not in the table, then the font of the text area must be changed from Options/Fonts.

The first row and column contain numbers which are used for calculating the code of each character. The calculation is very simple. When we find the character in the table, we take the number from the cell and add it to the number from the column. The result is the new character code. The easiest way to assign a key to a certain character is: first to select a row in the Keyboard Preview table and then after finding the right character to make a double click on this character in the Unicode Table Preview. The new character code is calculated and copied to the selected row. If there is no selected row, nothing is done.

Example: How do we get the number 1076 for the character in the Char preview? First, we find the location of the character in the Unicode Table Preview. In the picture above it is in the last row and in the eighth column. The number in the same row of the first column is 1070. The number in the same column of the first row is 6. So the final sum is 1076.

Navigation in the Unicode table can be done by using the two buttons: Page Up and Page Down situated on the right side of the table. Another possibility is to enter a number into Go to field under the table and go to the current row (Code button) or to the current page from the Unicode table (Code Page button ). A code page contains 256 chars.

Note that the small rectangle in some of the table cells means that for this code the current font does not provide support.

Control Panel

It is situated at the bottom of the dialog and it contains 5 components:

  • Hot Key switch - detemines which key combination should be used to switch between Lat/Aux keyboard layouts. The suggested hot keys are: Alt, Ctrl, Shift and combinations of them:

    If somehow these combinations are not convenient, the user can define a new shortcut from menu item Definitions/Shortcuts with an action Keyboard Switch located as last item in: Action / Action Items / Menu Item.

  • New -when the button is pressed, the user is asked to enter a name for the new keyboard.

    There is a possibility to use an already defined keyboard as a basis for the new one - when Use keyboard is selected. Otherwise the default Latin keyboard is used. The new keyboard can be selected for use in the system as auxiliary keyboard. When the keyboard is saved, it is added as the last one at the list of the keyboards (It can be seen by right mouse click on the identifier of the keyboards). The currently used as Aux keyboard is selected in the list. In front of each keyboard there is a number. This number can be used for a quick selection of a keyboard - <Ctrl> + number selects the keyboard corresponding to the number.
  • Remove - removes the current keyboard - a confirm dialog is shown.
  • OK - saves the changes in the current keyboard.
  • Cancel - closes the manager window without applying the changes to the current keyboard (if any).


This dialog window suggests a tool for changing the system fonts in several key components of the system. This tool concerns only the graphical interface. The reason is that the CLaRK System uses Unicode char encoding which allows the usage of a great range of different characters from different alphabets. Unfortunately, not every font supports the whole character table. In general, fonts are defined for a specific use and support 2 or 3 different alphabets. This manager allows changing the fonts of the components independently. The components for which the font can be changed are:

  • Text Window - this is the text area on the right side of the system main panel. This is the place where the text of the document appears.
  • Tree Window - this is the component on the left side of the system main panel where the tree of the document structure appears.
  • Attribute Table - a table, situated just below the Tree Window. It gives information about the attributes of the currently selected element.
  • Error Messages - this is the component at the bottom of the main system panel, where the error messages appear.
  • Tables - this sets the font of all tables in the system (Grammar editor, Tokenizer editor, ...).
  • Fields - this sets the font of all text fields in the system.

The dialog window:

The dialog contains 5 sections as follows:

  • Font Chooser - the panel on the left, showing all available fonts for the hardware system. The change of the font for a given component can be done by choosing a new font entry from here.
  • Component Chooser - it is situated in the upper right corner of the dialog window. In it the user chooses the component that replaces the font.
  • Font Style Modificator - changes the style of the font (Regular, Bold, Italics and Underlined).
  • Font Size Chooser - changes the size of the currently selected font. The font size can vary in the range from 5 to 50. If the user enters a number out of this range, the value is automatically corrected to 5 or 50. If the input is not a number, the old value is restored. When the user enters a new value for a font size, Preview button must be hit in order to refresh the preview component.
  • Font Previewer - makes a preview of the currently chosen font with a specified font style.

Note: if the text in the font preview does not change when a new style is chosen, it means that the font does not support this style.


This option can be used for changing the colors of the different components (tags, text, attributes, comments and background) in the text area(s) and the background of the tree area(s). The available colors are all the colors supported by the specific hardware and software environment in which the system is used. The color selection is supplied by a standard color chooser (computer architecture dependant).
Here is the dialog which appears after choosing the "Visuals" option:

The dialog window contains two sections:

  1. Colors Info

    This section is responsible for the color selection for the different components. The colors of the buttons on the right side indicate the corresponding components' colors. By pressing the buttons, a color chooser appears. If a new color is chosen, after closing the chooser, the background of the corresponding button is changed to the new selection. Otherwise it remains the same. The components which can change their colors are:

    • Tags (Tag Color)
    • Text (Text Color)
    • Attribute Values (Attribute Color)
    • Comments (Comment Color)
    • Text Panels Background (Text Background).
    • Tree Panels Background (Tree Background).

    Here is a preview of the color settings above:

  2. Control Buttons:
    • OK Button - Applies the new color settings.
    • Reset Button - Resets the color settings as follows:
      • tag color - pure blue;
      • text color - pure black;
      • attribute value color - pure green;
      • comment color - dark gray;
      • text background color - light gray.
      • tree background color - white.
    • Cancel Button - Cancels the current color settings.
    • Color Schemes Button - Opens a Color Schemes editor dialog, described below.

Color Scheme Editor

This tool gives the possibility for defining in what color the specific elements in the text area (tags, comments, text) will appear. This is a more advanced function because it defines separately the colors of the elements and does not depend on their type but on the results from the evaluation of arbitrary XPath expressions. This allows the different elements to be in different color depending on the context in which they appear. When an element is visualized on the screen, a set of XPath expressions is evaluated according to it as a context, and if one of the results is a non-empty list, a positive non-zero number, a non-empty string or a true boolean value, then the corresponding element is painted in the specified color.

Here is what the Color Scheme Editor looks like:

The basic unit defining the color layout is called Color Scheme. Each Color Scheme is responsible for the visualisation of one or more documents. A Color Scheme is identified by a name and it contains a set of pairs. Each pair specifies an XPath expression and a color. If the evaluation of the XPath gives a positive result, then the corresponding context node is painted in the color which is the second component of the pair. If more than one pairs define a color for a certain node, then the first one is used.

The structure of the editor window is the following:

  • Color Scheme Selector - this component is situated on the top of the window and it contains a list of all Color Schemes defined in the system.
  • Scheme Preview - contains a list of all entries (pairs) of the selected scheme in the Color Scheme Selector. Each entry of this list is an XPath expression which is painted in a specific color. The order of the different entries determines the sequence in which the XPath expressions will be evaluated. The first XPath, which returns a positive result, is taken into consideration. The operations which can be applied over the different XPath-color pairs are determined by the three buttons on the right side on the panel:
    • Add Line - adds a new list entry to the end of the list. The user is asked to enter an XPath expression and to select a color. Each XPath expression is evaluated relatively to each node in the corresponding document.
    • Edit Color - gives the possibility for modification of an existing XPath-color pair.
    • Remove Line - removes an entry from the list.
    The last two operations work over a selected entry in the list. If there is no a list selection - nothing is performed.
  • Control Panel - a set of buttons used for Color Scheme management:
    • New button - creates a new Color Scheme. The user is asked to specify a scheme name.
    • Remove button - removes the currently selected Color Scheme. This removal is preceded by a warning message.
    • OK button - closes the editor window and updates all modified Color Schemes.
    • Cancel button - closes the editor window and discards any modifications of the Color Schemes.

The Color Schemes can be used from the Edit DTD Layout or Edit Current Text Layout menu options - field Color Scheme.

Look & Feel

This option allows the change of the style (Look & Feel) in the graphical user interface of the system. This does not change the structure of the dialogs but only the way they are painted on the screen. Here follows an example what one dialog window looks like in different styles:




The number of the supported styles may vary on the different computers depending on the computer architecture, operating system and the Java Virtual Machine. The example above is taken on a Intel x86 machine working under Windows OS with JDK 1.4.2. On other machines the picture might look slightly different: more or less available styles, different colors, different icons, etc. The major purpose of this option is to make the user environment more friendly and convenient for use.

Encoding Correction

This option is relevant when the user works with files which rely on 8-bits character encoding (like ASCII). It is used for correct mapping between ASCII and Unicode character encoding. Because of the limitations in size of the ASCII format and the need of using different symbols, there are many character-sets which use one and the same code ranges. The problem here is how to distinguish which character-set should be used for a certain ASCII file. Unfortunately, very often such information is not available and the system can make a wrong decision when reading a file. For example, the user expects to read a file containing a Hebrew text but the system decides that it is a Cyrillic text and interprets it in a wrong way in Unicode. So the user is must specify which character-set to be interpreted from the system. That is the place where the Char Encoding Corrector can be used. Here is a screen-shot of the dialog window:

The choice list at the top of the window contains all the character-sets supported by the CLaRK System. For the moment the system supports 34 standard character-sets:

  1. Arabic (Windows-1256)
  2. Baltic (Windows-1257)
  3. Cyrillic (Windows-1251)
  4. Greek (Windows-1253)
  5. Hebrew (Windows-1255)
  6. Latin 1 (Windows-1250)
  7. Latin 2 (Windows-1252)
  8. Latin 5 (Windows-1254)
  9. Thai (Windows-874)
  10. Viet Nam (Windows-1258)
  11. Arabic (ISO 8859-6)
  12. Baltic (ISO 8859-4)
  13. Cyrillic (ISO 8859-5)
  14. Greek (ISO 8859-7)
  15. Hebrew (ISO 8859-8)
  16. Latin 1 (ISO 8859-1)
  17. Latin 2 (ISO 8859-2)
  18. Latin 3 (ISO 8859-3)
  19. Latin 9 (ISO 8859-15)
  20. Turkish (ISO 8859-9)
  21. Arabic (OEM-720)
  22. Baltic (OEM-775)
  23. Cyrillic DOS (OEM-855)
  24. Greek (OEM-737)
  25. Hebrew (OEM-862)
  26. Latin 2 (OEM-852)
  27. Multilingual Latin 1 (OEM-850)
  28. Multilingual Latin 1 + euro (OEM-858)
  29. Russian (Cyrillic 2) (OEM-866)
  30. Turkish (OEM-857)
  31. US Codepage (OEM-437)
  32. Cyrillic Russian (KOI8-R)
  33. Cyrillic Ukrainian (KOI8-U)
  34. Cyrillic Ancient (KOI8-C)

The table in the center represents a preview of the currently selected character-set. The table contains symbols with codes in the range from 128 to 255. The change of the selected character-set refreshes the content of the table. If the user is not sure which character-set must be used, s/he can choose the first option from the list: (System Default). This will make the system use the default character-set of the specific computer architecture and operating system.

The newly selected character-set can be applied by using button Apply or rejected with button Cancel. If a new character-set is applied, it will be taken into consideration each time an ASCII file is opened, i.e. importing/exporting documents, compiling DTDs, etc.

Add Default Attributes On Loading

For each element in an XML document, a set of default attributes can be defined in the DTD. These are attributes which are not presented in the elements explicitly, but it is assumed that they are there with a default value set in the DTD. Each time a document is opened, for every element with absent default attribute, it is explicitly added with its default value.

Simple Tags

An icon on the toolbar

Shows and hides the tags in the text area. If the tags are hidden in the area, they are replaced by square brackets: [ - for the opening tags and ] - for the closing tags. If the Show Attributes In Area option is activated and the tags are hidden, then attributes are not visible as well.

Show Attributes In Area

An icon on the toolbar

Enables/disables the appearance of the attributes in the text areas. If the attributes are shown in the area, they cannot be removed or added , but they can be modified. Attribute management is supported by using a right mouse click on the table below the tree panel of the editor.

Disable Validation

Enables/disables validation of the document according to the DTD and active All Children Constraints (if any). If Validation is enabled, all the errors for the current document (if any) are shown in the Error Massage Area.

By performing a double click on a certain error message, the node containing the corresponding error is selected in the Tree Panel and in the Text Area.

Check DTDs at Start-up

When this option is selected, the system performs a check-up of the compiled DTDs database each time it is started. The system tries to load each compiled DTD and in case of failure the system removes the record for this DTD, i.e. it is not a known DTD for the system any more. For all documents which refer to a DTD, removed as a result of loading failure, the system asks the user to specify another DTD. In case of normal use of CLaRK this will never happen. DTDs database damages may appear when there is an external intervention of the system data files which could be performed by the user or by other application. Another cause for inability of the system to read the DTDs database could be that the system is used with data files which are not produced by it, but by another (incompatible) version to the system. To prevent this, each new version of the system must be installed in a separate directory.

Unselecting this option may reduce the starting time of the system. This might be useful when the system is running on a slower machine and when its DTDs database contains many and large DTDs. In all other cases this option is recommended to be selected.

Compile System DTDs at Start-up

In the system there is a set of system DTDs mainly concerning the application of tools supporting XML Tools Queries. All these DTDs define the valid XML structures which can serve as tool queries. Another group of system DTDs defines the structure of the XML representation of different tool definitions (the rules of a grammar, a tokenizer definition, constraints definitions, etc.). All these DTDs are placed in the system resources directory. When a tool needs a certain system DTD, it is automatically compiled (if this have not been done before) in the system database. Thus some DTDs are not compiled unless they are needed in the processing.

Here, for convenience, the system can find all system DTDs which are not compiled yet in the system database and compile them one by one. This check and pre-compilation (if needed) will be performed at start-up if this option is selected.