Concordance Tool Demo

The goal of this demo is to show how the concordance tool of the CLaRK system can be used for searching of some kind of linguistic phenomena within XML documents.

The concordance tool is implemented on the basis of the XPath engine, the regular grammar engine and a sorting module. The concordance tool is useful for searching some kind of units within some bigger units. For instance, a word within a sentence, a phrase within a paragraph and similar.

The bigger element is called here a context and the smallest element is called item. The context is defined by an XPath expression. The item could be defined by an XPath expression, a regular expression, a regular grammar or by a grammar query. There are additional possibilities to restrict the context by grammar query or XPath expressions.

The found units are stored in a new document and presented to the user in a table format. The user could also open the document as ordinary XML document and use all the tools available in the system in order to process this document further.

Demo 1 : Concordance over pure text using grammars or grammar queries

The goal here is to find all the possible uses of verbal tenses in the text. Verbal tenses are determined by the grammar tenses - described in Grammar Tool Demo.

The document used for this demo is Standart20030524.tag.

The tokenizer that segments text to words is MixedWord tokenizer which is defined within the system. If you would like to see the token types defined within this tokenizer you have to select Tokenizers item from the menu Definitions.

In order to run the demo you have to perform the following steps:

  1. Check whether the document: Standart20030524.tag is loaded in the system. If it is not, then load it as it is described in Import XML Tool Demo from the demo directory. If it is, open it.
  2. Check whether the grammar tense is in the system grammars. Open the Grammar Manager dialog by clicking Select button. If it is not, then you can load it by clicking File I/O button - Load grammar from the file. The grammar is tense.gram in the demo directory.
  3. Create the grammar query as it is described in the Grammar Tool Demo or load it from the demo directory. Before loading the query you must be sure that there is a no-blank filer in the system. Then you can import the grammar query tense.gram.que in Root : SYSTEM : Queries : Grammar directory of the system.
  4. Open the query dialog of the Concordance tool from the menu item Tools.
  5. In the text box Define Context write the following XPath Expression //text/descendant::p |//head which selects all paragraphs and headings within the document(s).
  6. Select the Grammar Panel. The user has three possibilities. To search with a regular expression in Simplified Usage Mode, with a grammar in Normal Usage Mode or with a grammar query in Query Usage Mode.
  7. The uses of verbal tenses in the text can be found using a grammar or a grammar query.

  8. To make the search with a grammar query:
  9. The user can select Text only option if in the text some of the words are marked. When this option is selected, the engine skips tags (if any) and takes their text content. It is not selected in this demo.
  10. Select Add number attribute option in order to enumerate each item that is found.
  11. Select Add source attribute option in order to add an attribute with the source document name to the item.
  12. Select Add path attribute option in order to add an attribute with the XPath value coming from the source document to the item.
  13. The dialog of the tool in this case has to be:
  14. To make the search with a grammar:
  15. The dialog of the tool in this case has to be:
  16. Then you can run queries. The result from both the queries will be the same, because they use the same grammar to select verbal tenses and the same XPath for restriction. The result is represented in a table view. The table consists of 4 columns - Item, Left Context, Right Context and Comment.
  17. The result can be sorted as it is described in Sort Tool Demo. To sort the result press Sort Table button.
  18. A new part of the dialog appears where the user can specify via keys the criteria for sorting. The user can sort by Item, by Left or Right Context. Let us sort the result by the Item.
  19. In the Prefix column of the sort table select I - it indicates that Item will be sorted.
  20. In the Expression column write the following XPath Expression text() - it determines the text in the items that are found.
  21. Specify the order - Asc stands for ascending.
  22. Select the Trim option in order not to sort by the spaces around the text (if any), and normalized in order different graphical writings of the same item to be sorted as equal.
  23. The dialog of the result in this case has to be:
  24. Then you can sort items by Sort button.
  25. In order to have Items sorted in the result document, the user must press Update button.
  26. The user can enlarge the number of characters that are in the Context columns from Settings.

When the dialog is closed, the user is asked to name the document with the result and is offered to see that document.

The query using a grammar is saved in document tenseN.conc.que, and the one using a grammar query is saved in the document tenseQ.conc.que in the demo directory.

Demo 2 : Concordance over pure text using regular expressions

The goal here is to find number expressions in order to be able to make precise grammars for each different kind of the number expressions.

The document used for this demo is Standart20030524.tag.

The tokenizer that segments text to words is MixedWord tokenizer which is defined within the system. If you would like to see the token types defined within this tokenizer, you have to select Tokenizers item from the menu Definitions.

In order to run the demo you have to perform the following steps:

  1. Check whether the document: Standart20030524.tag is loaded in the system. If it is not, then load it as it is described in Import XML Tool Demo from the demo directory. If it is, open it.
  2. Open the query dialog of the Concordance tool from the menu item Tools.
  3. In the text box Define Context write the following XPath Expression //text/descendant::p |//head which selects all paragraphs and headings within the document(s).
  4. Select the Grammar Panel. The user has three possibilities. To search with a regular expression in Simplified Usage Mode, with a grammar in Normal Usage Mode or with a grammar query in Query Usage Mode.
  5. Select Simplified radio button for Usage Mode.
  6. In the Query String write $NUMBER+,$# that selects an expression starting with a number(s) and followed by a token - it can be any token with a category recognized by the tokenizer - a Latin word, punctuation, a brace etc.
  7. From the list Tokenizer choose MixedWord - it separates text into words.
  8. There is a need of a filter, because there are spaces between words - the no-blank filter is used. It is described in Filter Tool Demo.
  9. Normalize check box can be selected because in the text there can be words with capitalized letters.
  10. Select Add number attribute option in order to enumerate each item that is found.
  11. Select Add source attribute option in order to add an attribute with the source document name to the item.
  12. Select Add path attribute option in order to add an attribute with the XPath value coming from the source document to the item.
  13. The dialog of the tool in this case has to be:
  14. Then you can run the query. The result is represented in a table view. The table consists of 4 columns - Item, Left Context, Right Context and Comment.
  15. The result can be sorted as it is described in Sort Tool Demo. To sort the result press Sort Table button.
  16. In the Prefix column of the sort table select I - that indicates that Item will be sorted.
  17. In the Expression column write the following XPath Expression text() - it determines the text in the items that are found.
  18. Specify the order - Asc stands for ascending.
  19. The dialog of the result in this case has to be:
  20. Then you can sort items by Sort button.
  21. In order to have Items sorted in the result document, the user must press Update button.
  22. The user can enlarge the number of characters that are in the Context columns from Settings.

When the dialog is closed, the user is asked to name the document with the result and is offered to see that document.

The query is saved in numEX.conc.que document in the demo directory.

Demo 3 : Concordance over marked documents

The goal here is to find all tok elements in the text that contain numbers. Such elements have an attribute type with value num.

The document used for this demo is Standart20030525concord.tag.

In order to run the demo, you have to perform the following steps:

  1. Check whether the document: Standart20030525concord.tag is loaded in the system. If it is not, then load it as it is described in Import XML Tool Demo from the demo directory. If it is, open it.
  2. Open the query dialog of the Concordance tool from the menu item Tools.
  3. In the text box Define Context write the following XPath Expression //p which selects all paragraphs.
  4. As the search in the text will be made with a XPath, the XPath panel will be used. In the text box Search Elements write the following XPath Expression tok[@type="num"] that selects all tok elements containing numbers within the document(s).
  5. The Left Context and the Right Context are used when the user wants to restrict the context.
  6. Select Add number attribute option in order to enumerate each item that is found.
  7. Select Add source attribute option in order to add an attribute with the source document name to the item.
  8. Select Add path attribute option in order to add an attribute with the XPath value coming from the source document to the item.
  9. The dialog of the tool in this case has to be:
  10. Then you can run the query. The result is represented in a table view. The table consists of 4 columns - Item, Left Context, Right Context and Comment.
  11. The result can be sorted as it is described in Sort Tool Demo. To sort the result press Sort Table button.
  12. A new part of the dialog appears where the user can specify via keys the criteria for sorting. The user can sort by Item, by Left or Right Context. Let us sort the result by the Item.
  13. In the Prefix column of the sort table select I - it indicates that Item will be sorted.
  14. In the Expression column write the following XPath Expression tok/text() - it determines the text in the items that are found.
  15. Specify the order - Asc stands for ascending.
  16. Select the Trim option in order not to sort by the spaces around the text (if any), and number in order to compare the elements as numbers, not as strings.
  17. The dialog of the result in this case has to be:
  18. Then you can sort items by Sort button.
  19. In order to have Items sorted in the result document, the user must press Update button.
  20. The user can enlarge the number of characters that are in the Context columns from Settings.

When the dialog is closed, the user is asked to name the document with the result and is offered to see that document.

The above query is saved in the document numbers.conc.que in the demo directory.