Node Info Tool Demo

The goal of this demo is to show how the Node Info tool of the CLaRK system can be used to add meta-information about an XML document in that document.

The information is stored in the format recommended by the Text Encoding Initiative (TEI) standard.

The Node Info tool is used to add information for the number of token occurrences according to their type or/and elements within the document. The type of the tokens is their Token Category according to a tokenizer. Elements to be counted are specified by an XPath expression.

If the XML document is valid via TEI.2 DTD, then the result is stored in the header element, otherwise the result is inserted as a first child of the root element.

Demo1 : Word Info

Our goal here is to add information about the tokens in the text nodes of the document - how many tokens of each Token Category are in the document.

The Document used for this demo is Standart20030524.tag.

The tokenizer is MixedWord tokenizer which is defined within the System. If you would like to see the token types defined within this tokenizer, you have to select Tokenizers item from menu Definitions.

In order to run the demo you have to perform the following steps:

  1. Check whether the document: Standart20030524.tag is saved in the system. If it is not, then load it as it is described in Import XML Tool Demo and save it in the system.
  2. Open the query dialog of the Node Info tool from the menu item Tools.
  3. In the text box Enter XPath for node: write the following XPath expression: //text/descendant-or-self::text() which selects all textual element within the document(s).
  4. Select Word Info option.
  5. From the list Choose Tokenizer choose MixedWord.
  6. The user has to select the documents from the corresponding document group via the button Add Documents.
  7. The dialog of the tool in this case has to be:
  8. Then you can run the query.
  9. The information gives the number of occurrences for each Token Category.

    It can be added in <extent> element in the teiHeader of the document and/or saved in another document.

Demo2 : Tag info

Our goal here is to add information about the elements in the document - how many elements are in the document.

The Document used for this demo is Standart20030524.tag.

In order to run the demo you have to perform the following steps:

  1. Check whether the document: Standart20030524.tag is saved in the system. If it is not, then load it as it is described in Import XML Tool Demo and save it in the system.
  2. Open the query dialog of the Node Info tool from the menu item Tools.
  3. In the text box Enter XPath for node: write the following XPath expression : //body - all the descendant nodes of the given one will be counted.
  4. Select Tag Info option.
  5. The user has to select the documents from the corresponding document group via the button Add Documents.
  6. The dialog of the tool in this case has to be:
  7. Then you can run the query.
  8. The information gives the number of occurrences for each element.

    It can be added in <encodingDesc> element in the teiHeader of the document and/or saved in another document.