The goal of this demo is to show how the extract tool demo can be used for extraction of information from XML documents.
The Extract tool is used for selection of data from document(s) and saving these data in (an) XML document(s).
Our goal here is to construct a list of all the sentences that have at most 3 words in some set of documents. The expectation is that some of these sentences can be determined wrongly.
The document used for this demo is Standart20030524extr.tag
.
In order to run the demo you have to perform the following steps:
Standart20030524extr.tag
is loaded
in the system. If it is not, then load it as it is described in Import XML Tool Demo.Tools
.Search
write the following XPath expression :
//s[count(child::tok)<=3]
which selects all the sentences that
have less than or equals to three words (tok elements) within the
document(s).Include Subtree
option.Create source attribute
. The name of the
attribute is taken from the corresponding text field.Create path
attribute
. The name of the attribute is taken from the corresponding text
field.Create number attribute
. The name of the
attribute is taken from the corresponding text field.The dialog of the tool in this case has to be:
Then you can run the query.
Root : SYSTEM : Results : Extract
- the standard group
for results from this tool. If necessary, you can change it.The above query is saved in the document sLD3.extr.que
in the demo
directory.