The goal of this demo is to show how the MultiQuery tool of the CLaRK system can be used for step by step implementation of a task.
This tool is designed to call other tools. It does not work directly with the
XML data. The tool uses a list of XML Tool Queries which are executed one by one
in the order they appear. There is no limitation for the level of inclusions of
the queries. If there has been detected a cyclic inclusion (a multi-query includes
itself or any of its sub-queries includes it) the system indicates an error. The result from
each single tool application is an input for the next single tool application.
The result from the last single tool application is the result from the
MultiQuery Tool
.
The user can change the order of application of the different queries or/and
conditional application of certain operations. This can be done by
Conditional Control Operators
or for short Controls
,
specific for this tool.
The ordinary order of application starts from the first one and
proceeds one by one up to the last one. When using Controls operators, some queries
can be applied only if certain conditions are true. Such conditions are: the
true or false value of a result from an XPath evaluation, whether the preceding
single tool application has or has not modified the working document, or it is
unconditional (always succeeding). When a condition for a Control is true, the
next query (or another Control), which will be applied, is defined in the control
itself. Otherwise, the application proceeds with the next entry in the order
(query or another control). The Control operators address their targets (queries
or controls to be applied in case of success) by pointing their
label
s. Each entry (row) in the table of the MultiQuery Tool can
have a label (unique identifier) which can be referred by control operators. It
is an error if a Control operator uses a target label which does not exist.
Each Control operator may consist of three parts:
Type
- determines the type of the Control, i.e. the conditions
for checking. There are several types of controls:
IF (XPath)
- the condition is an XPath expression. If the
result from its evaluation on the current working document is: a non-empty
list, a non-empty string, a non-negative number or a true boolean value the
Control succeeds.IF NOT (XPath)
- the condition is an XPath expression. In
contrast to the previous type,
here, if the result from the XPath evaluation on the current working
document is: an empty list, an empty string,
a negative number or a false boolean value, the Control succeeds.IF CHANGED
- the condition is the result from the previous single tool application. If the
current working document has been modified by the previous (not necessarily preceding in the table) operation,
the Control succeeds.IF NOT CHANGED
- the condition is the result from the previous single tool application.
If the current working document has NOT been modified by the previous (not necessarily preceding in the table)
operation, the Control succeeds.GOTO
- the condition is always satisfied. This is an unconditional movement to the target
of the Control.XPath
(dependant on the type) - An XPath expression to be evaluated, the result of which
determines the success of the control. This part is used in controls of type IF (XPath)
and
IF NOT (XPath)
.Target
- a label reference which points to the next location where the execution will continue
in case of satisfied condition for the Control.By using the Controls-labels technique, the user can model the famous IF-THEN-ELSE and WHILE-CONDITION-DO structures in order to make the processing more flexible. The composition of different Controls allows the user to create varied 'programs' or 'scripts' capable of doing certain jobs. It is up to the user to create efficient and reliable processing procedures.
Our goal here is to disambiguate the text. For this reason the text will be tagged and additional information for each word will be added namely - all the possible morphological analyses.
For marking tokens and adding the morphological information,some grammars are used. For more information about grammars and their queries see Grammar Tool Demo.
Some unnecessary data is removed from the file using remove queries. For more information about queries see Remove Tool Demo.
A constraint for disambiguation disambiguate attribute
is
applied over tokens. For more information about the constraint and query see Constraints Tool Demo.
The document used for this demo is: Standart20030524.tag
.
In order to run the demo you have to perform the following steps:
Standart20030524.tag
is loaded in
the system. If it is not, then load it as it is described in Import XML Tool Demo. If it is, open
it.tokenize.gram.que
and
Eng_Dict_Attribute.que
are saved in the system. If they are not,
then load and compile grammars tokenize.gram
and
dict_attr.grm
from Grammars/ Grammar Manager - File I/O/ Load
grammar from file
button. Import the queries in Root : SYSTEM :
Queries : Grammar
according to grammarquery.dtd
as it is
described in Multi Import Tool
Demo.whitespace.rem.que
and
tok.rem.que
are saved in the system. If they are not, then you can
import queries in Root : SYSTEM : Queries : Remove
according to
removequery.dtd
as it is described in Multi Import Tool Demo.attribute.const.que
is
saved in the system. If it is not, then load the constraint
disambiguateAt.cnst
from Constraints/ Value Constraints/
Edit Value Constraints - Load From File
button. Import the query in
Root : SYSTEM : Queries : Constraints
according to
constraintsquery.dtd
as it is described in Multi Import Tool Demo.Tools
.tokenize.gram.que
and Eng_Dict_Attribute.que
queries in the table with queries. To do this, press Add Query
button, set Root : SYSTEM : Queries : Grammar
in Current
Group
combo box and select the two queries from the list with all grammar
queries. Then press OK
. The order of the grammars is important
because Eng_Dict_Attribute.que
works over the result of
tokenize.gram.que
. Reordering can be done by simply dragging the
rows of the table up or down.whitespace.rem.que
and tok.rem.que
queries in
the table with queries. To do this, press Add Query
button, set
Root : SYSTEM : Queries : Remove
in Current Group
combo box and select the two queries from the list with all remove queries. Then
press OK
. attribute.const.que
query in the table with queries. To do
this press Add Query
button, set Root : SYSTEM : Queries :
Constraints
in Current Group
combo box and select the
constraint query from the list with all constraint queries. Then press
OK
. w
element, and all possible
morphological information for that token in ana
attribute,
unrecognised tokens marked with tok
element and punctuation marked
with pt
elements. And there is a dialog that gives the user a
possibility to disambiguate manually the ambiguity.The above query is saved in the document xmlEnc.multy.que
in the
demo directory.