Extract Tool Demo

The goal of this demo is to show how the extract tool demo can be used for extraction of information from XML documents.

The Extract tool is used for selection of data from document(s) and saving these data in (an) XML document(s).

Demo: Extraction of sentences with less than 3 words

Our goal here is to construct a list of all the sentences that have at most 3 words in some set of documents. The expectation is that some of these sentences can be determined wrongly.

The document used for this demo is Standart20030524extr.tag.

In order to run the demo you have to perform the following steps:

  1. Check whether the document: Standart20030524extr.tag is loaded in the system. If it is not, then load it as it is described in Import XML Tool Demo.
  2. Open the query dialog of the Extract tool from the menu item Tools.
  3. In the text box Search write the following XPath expression : //s[count(child::tok)<=3] which selects all the sentences that have less than or equals to three words (tok elements) within the document(s).
  4. Select Include Subtree option.
  5. In order to add an attribute with the source document name to the auxiliary tag the user has to select Create source attribute. The name of the attribute is taken from the corresponding text field.
  6. In order to add an attribute with the XPath value coming from the source document for the selected nodes, the user has to select Create path attribute. The name of the attribute is taken from the corresponding text field.
  7. In order to add an attribute with the extract result number to the auxiliary tag the user has to select Create number attribute. The name of the attribute is taken from the corresponding text field.
  8. The dialog of the tool in this case has to be:

    Then you can run the query.

  9. The result will be saved in the document with a name given from the user in the group Root : SYSTEM : Results : Extract - the standard group for results from this tool. If necessary, you can change it.

The above query is saved in the document sLD3.extr.que in the demo directory.