Available Resources
Text Acknowledgements
Related links


CLaRK System

CLaRK System Online Manual

Bulgarian dialects'
electronic archive

eXTReMe Tracker








Tool Application Modes - Processing Current Document vs. Multiple Apply

The processing of XML data in the CLaRK System can be done in two ways. Each of them has its advantages and disadvantages.

As the system is designed to work with corpora, this in most of the cases involves working with large amount of data. Sometimes this can be crucial for the processing time and the system resources which are needed for a certain task. Therefore CLaRK supports two techniques for processing XML documents:

  • processing Current Document. 'Current Document' here refers to an XML document which is opened in the main editor and which is currently active (when more than one document are opened). During the process of work the system interacts with the user by graphical dialogs for processing options, confirmation, warning or error messages. In case of error or invalid settings or others the user can cancel the current operation.
  • processing Multiple Apply. The processing is applied to one or more documents which are already saved in the system. During processing, the document(s) is/are not opened in the editor but only the final results are reported. There is no interaction with the user during processing. The results from an operation (modified or new documents) are saved in the Internal Documents Database after a successful procedure.

The advantages of the first type of processing are that the user can see the data and can adjust the tool settings according to the specific task. The user can check which specific data the given tool will be applied to without an actual application. One disadvantage here is that the visualization of large documents requires system resources which can make the processing extremely slow. Another disadvantage is that the specific tool can be applied only to one (current) document.

To solve these problems, the CLaRK system offers the second approach (Multiple Apply). Here the user can select one or more documents which the certain tool will be applied to. The processing proceeds according to the order of the document selection. During the processing time the selected input documents are not opened in the editor which takes considerably less system resources and makes the procedure faster. Here, after starting the application, no user input is expected. During runtime, on the screen status messages are printed showing the current state of the process: currently processed document, result message after application to a single document, result document, etc.

All tools which support these two modes of application have a similar graphical interface dialogs. The mode is controlled by a checkbox "Multiple Apply" (fig. 1) situated on the main tool dialogs. If the checkbox is unselected, the tool will be applied to the current document. Otherwise, an auxiliary panel is shown under the checkbox (fig. 2).

Fig. 1 Tool application to the current document

Fig. 2 Tool application to multiple Internal Documents (Multiple Apply)

Multiple Apply Auxiliary Panel

Basically, the panel represents a table with the selected documents the specific tool to be applied to (column INPUT). Also the table contains the result document names in which the result from the application should be stored (column RESULTS). If for each input document one result document is produced, the input name and the result name appear on the same row. If for all selected input documents only one result document is produced (tool and/or options dependent), its name should appear on the first row of column RESULTS. Unless the Overwrite option is set (see Options below) the second column of the table can be edited.

On the right side of the panel, the buttons for document selection and options are situated:

  • Add Documents - Opens a selection dialog with the internal documents arranged in groups or in a list. The result from the selection is appended to the table.

  • Remove Documents - Removes the selected rows (documents) from the table. The removal is NOT preceded by a confirm message.

  • Clear All - Removes all entries from the table without a confirmation.

  • Options - Opens an options dialog with settings concerning the actual application and the result forming. The dialog window looks in the following way:

Multiple Apply Options

The options dialog is divided into two sections:

  1. Output section - determines the way the result is formed. The possibilities are: for each input document to create one result document (option Separate) and for all input documents to create one result document (option United). The second option is disabled for some of the tools, for which it is not appropriate. When the second option is active and selected, the user is expected to supply a result document name in the field Name.

  2. Mode section - determines the way the result document names are generated. The Overwrite option makes the new created documents to overwrite the old input files. This is useful in cases the result documents represent the modifications of the input documents. This option is disabled for some of the tools, for which it is not appropriate. The other option Create New says that the result will be stored in one or more new documents. The initial names of the result documents are formed by concatenation of the initial input names and the suffix form field Default Extension. The user can modify the suffix. Each tool has its own default initial suffix (its format is: -tool_name- ). By pressing the Reset button the value in the Default Extension field is set to its initial value. At the bottom of the dialog window there are two checkboxed options:

  • Always save - after a tool application, if an input document is not modified and it serves as a result, it is not saved. If this option is selected, the result is always saved whether it is the same as the original input document or not.

  • Always overwrite - when as a result document name, an existing document name is set, the system raises a warning message and cancels further application. If this option is selected, no warning messages are shown and the existing internal documents (if any) are replaced with the new results.

Result Folder

When a tool is applied in a Multiple Apply mode the user have to specify a location where the result(s) should be stored (except for the cases the result overwrites the input data). A result location group can be specified at the bottom of the panel. By default, each tool has its own specific result group ( [Corpus_name] SYSTEM : Results : <tool-name>). The user can point to any group in the Internal Documents database with one restriction: results cannot be stored in the system groups under group [Corpus_name] SYSTEM : Queries and their descending groups. This restriction comes from the fact that these groups must contain XML documents of a special type (tool queries) and they must be valid according certain DTDs. These requirements for the result cannot be controlled in advance. Having pressed the Change button, the user gets the following  result group (folder) chooser dialog:

A new result group can be set either by pointing to a group in the tree and pressing the Choose button, or by performing a double left-mouse-click on the target group. If the selected group is not appropriate for a result group, a warning message appears and the control is returned to the group chooser dialog. Two additional operations are available here: adding a new group (button New Group) and removing an existing group (button Remove Group).

Result DTD

For some of the tools there is one more option available: specifying a DTD for the result document(s). This option is available only for tools which produce new result documents. The user can select any DTD compiled in the system or to preserve the DTD from the input document(s) (option <Original DTD>).

While the real tool application in Multiple Apply mode is performed, the user is shown an information dialog which indicates the overall status of the process. The system shows which document is currently processed, result messages after each single application and where the result is stored. In case of errors, corresponding messages are shown. An example status window is in the following picture:

During runtime the user can cancel the tool application by using the Stop button. The application is not interrupted immediately but only after the current operation has been completed (opening a document, applying a single operation or saving a result).

XML Tools Queries

The user can save different configurations of the tools in order to execute them many times. Except for the specific tool settings, the user can specify which input documents the tool will be applied to and how the result will be formed and saved. All these settings are represented as XML documents in the Internal Documents database. Further more they can be processed system with all facilities. This specific kind of XML documents in the system are called XML Tool Queries or just queries. Each tool in the CLaRK System has its own specific type of queries with their specific DTDs. The queries are located in a special place in the Internal Documents database (system groups [Corpus_name] SYSTEM : Queries : <tool_name> and all descending sub-groups). Each XML query is valid according to its DTD.


fig. 3 Queries Panel for XPath Remove Tool

A management panel similar to the one in fig. 3 appears (with very small variations depending on the specific tool). After choosing Select button, the corresponding Query Manager is shown and the user can load a query in the current tool. If the Reset button is clicked, then the tool settings are reset to their initial values. In this case the Update button changes to Save. The settings on the current tool dialog window can be saved by using the Save/Update button. If the user creates a new query then after pressing the Save button s/he must supply a query name. If changes (modifications) on an existing queries are to be saved, the Update button requires the user confirmation for overwriting. All queries which are saved/updated are stored in the Internal Documents database.