The goal of this demo is to show how the constraints tool of the CLaRK system can be used for :
The general syntax of the constraints in the CLaRK system is the following:
The following types of constraints are implemented in CLaRK:
In this kind of constraints the selection of nodes, which the constraints will be applied to, is defined by XPath expressions. The contents of the selected nodes must match a description given as a regular expression in the constraint. This kind of constraints work in validation mode. During application the selected nodes are split into two sets, containing nodes matching the regular expression and nodes which do not match. The user can navigate subsequently through any of these sets of nodes.
These constraints can be used for simulation of XML Schema constraints over textual nodes. In addition to checking the content, these constraints - via the XPath expression - can also determine the context of the elements, which they will be applied to. In this way they can be used for imposing regular constraints in addition to these in the DTD, making them more specific on the basis of the surrounding context.
In this kind of constraints the selection of nodes is defined as an XPath expression. On the selected nodes separately another XPath expression is evaluated and the result from each evaluation is converted to a number using the rules defined in the XPath specification. A constraint is satisfied for a node if the corresponding numeric result is in a range given by two numbers MIN and MAX. The MIN and MAX values can be dynamically determined for each node by other two XPath expressions, which return numbers as results. These kinds of constraints can be useful for checking equal number of nodes of different type within a given context.
These constraints determine the possible children, attributes or the parent of an element in a document. These constraints apply when the user enters a new child or a new parent of an element. In both cases a list of possible children or parents are determined by the DTD, but depending on the context in the document an additional reduction of these lists is possible. In case the only possible child of an element is a text, or an attribute is entered, then these constraints determine the possible text values for the element.
This type of value constraint sets limits the possible parents of a node. There are two ways of applying this constraint type: by changing the parent of a node(local) or explicitly running the constraint engine(global).
The first possibility changes the parent of a node(or a set of nodes at one level). The list of all the relevant parent nodes can be restricted further by applying other constraints. The final list contains the intersection between the source of the constraints and its former content. If the operation - changing the parent of a set of nodes - is performed, then all compatible (parent) constraints are applied.
The second possibility is running the Constraint Engine. It works in the following way. First, the targets are selected (by their tag names and the XPath restriction). Then the source is compiled. If there is more than one choice, the user is asked to select one option from a list. If the choice happens to be exactly one element, it can be automatically inserted as a parent of the target. The action of a constraint depends on the Application Mode set for the constraint.
This kind of constraints deals with the content of some elements. They determine the existence of certain values within the content of these elements. A value can be a token, an attribute or an XML mark-up and the actual value for an element can be determined by the context. Such a constraint works in the following way: first it determines to which elements in the document it is applicable (the conditions over the context of the nodes are expressed by an XPath expression), then for each such element in turn it determines which values (usually they are also pointed by an XPath expression) are allowed and checks whether in the content of the element some of these values are presented as a token or an XML mark-up. If there is such a value, then the constraint chooses the next element. If there is no such a value, then the constraint offers to the user a possibility to choose one of the allowed values for the element (attribute) and the selected value is added to the content.
This type of value constraint sets limits the names of node's children and the content of its text children. All children, that are tags, must have names coinciding with the name of some node from the source list. Then all the data in text children is tokenized and a list A of tokens is formed. After that all the data in text nodes in the source list is tokenized and a list B of tokens is formed. For every token in A there must exist a token in B such that the values (not categories) of A and B are equal. This type of a value constraint is applied automatically during the validation of a document.
The goal here is to disambiguate manually a morpho-syntactically annotated text. The text first is tokenized, then possible morpho-syntactic tags are added to each wordform. At the end the text is manually disambiguated with the help of Value constraint of type Some Children.
The text in the document is segmented in tokens - Latin words, numbers,
punctuation, etc. The following annotation is used in the text:
<pt>
tag for punctuation , <w>
tag for
Latin words and <tok>
tag for other tokens like numbers.
For each word we encode the wordform in <ph>
tag, the
appropriate morpho-syntactic information from the dictionary is encoded as two
elements: <aa>
element, which contains a list of
morpho-syntactic tags for the wordform separated by semicolons, and
<ta>
element, which has to contain the actual
morpho-syntactic tag for this use of the wordform in the text.
The value of <ta>
element has to be among the values in
the list presented in the element <aa>
for the same
wordform.
Note that when the context determines only one possible value, it is added automatically to the content of <ta> element and thus the constraint becomes a rule.
If there are more than one values, the constraint offers to the user a possibility
to choose one of the allowed values. While listing the different choices, the
user can get brief information about the meaning of each choice. This
information must be stored in an internal document - Help Document
.
Its structure is described in a DTD in the file: helpFile.dtd. The information
about a given choice appears in the status bar of the editor when the mouse
pointer is over the choice.
The document used for this demo is
Standart20030524constr.tag
.
The tagset description is stored in the help file tag.ttt
.
The tokenizer disambiguate
is used. It is in the demo directory
and can be loaded within the System. If you would like to see the token types defined
within this tokenizer, you have to select Tokenizers
item from menu
Definitions
.
In order to run the demo, you have to perform the following steps:
Standart20030524constr.tag
and the
help file tag.ttt
are loaded and saved in the system. If they are not, then
load them as it is described in Multi
Import Tool Demo. If the documents are loaded in the system, open
Standart20030524constr.tag
.disambiguate
is in the tokenizers
list. If it is not then you can load it from Tokenizers
item from
menu Definitions
by Load Tokenizer
button.NO_SC
that skips semicolons in the
text is in the filters list. If it is not, you can create it as described
in Filter Tool Demo.
Tools/Constraints/Value Constraints/Edit Value Constraint
.General
panel insert as a Constraint Name
-
disambiguate elements
and select for Type of
Constraint
- Some Children
. It means that the constraint
will work over the children nodes of the selected element.Options
panel and set Insertation Mode
radio button as an Application Mode
. It means that a child node
of the selected element is to be inserted (or a token will be added to the
text content of the element). The position of the inserted child can be set from
Position
text field, where children are counted starting from 1. If
the content of the element to which a token is added is non-empty, then it is
separated from the rest of the text by the string, stated in the
Separator
text field.Show Status Before
in order to see the number of the
nodes which the constraint will be applied to.Show Status After
in order to see how many nodes the
constraint was applied to.Target Specification
Target specification determines the nodes which the constraint will be applied to. In this demo the target is all <ta> elements which do not have content.
Target XPath
in Target
panel write
//ta[not(child::*)]
. Note that ta
is the name of the element which the constraint will be
applied to. not(child::*)
determines all elements ta
which have no children as target
elements of the constraints.Source Specification
Source specification determines where the possible values will be taken from. There are three possibilities: local document, external document, XML mark-up (including text). In the first case the value is pointed by an XPath expression. In the second, the value is in another document and the XPath expression is evaluated within this document. The third option allows the user to state explicit XML fragment or text. If the selected source is text, it is tokenized. In this demo the possible values are stored as text in the <aa> element for each word.
Local Document
radio button in
Restriction
panel.XPath/XML
write the following XPath expression:
previous::aa/text()
. This XPath specifies the source list for the
constraint - all possible values for the target node are in all analyses
<aa>
element and are separated by semicolons.Set a
Tokenizer
in Advanced
panel and select
disambiguate
tokenizer from the list of tokenizers. This tokenizer
is used for segmentation of the tagset.Set a Filter
and set NO_SC
- it
filters the semicolons.Set Help Document
and select tag.ttt
from the list.Active
column. Now the constraint is
ready to be applied.The user can apply constraints in two ways:
Apply Constraint
button at the bottom of the dialog.If the user applies the second method, he/she must:
Done
button - in
order to save all changes over the constraints.Tools/Constraints/Value Constraints/Apply Value Constraint
.disambiguate elements
constraint from Add
Constraint
button. Apply
button to actually apply the constraint.The dialog of the tool in this case has to be:
The above query is saved in the document main.cnst.que
in the demo
directory.
When the constraint is applied over the document, a small dialog appears for elements, which have more than one possible values.
Going through values their description is shown in the status bar. This is very helpful when the paradigm is rather complex.
This demo is very similar to the previous one. Except for the fact that the morpho-syntactic information is represented as values of attributes instead of content of elements.
The goal is the same to disambiguate manually a morpho-syntactically annotated text. The text first is tokenized, then possible morpho-syntactic tags are added to each wordform. At the end the text is manually disambiguated with the help of the Value constraint of type Some Attributes.
The text in the document is segmented in tokens - Latin words, numbers,
punctuation, etc. The following annotation is used in the text:
<pt>
tag for punctuation, <w>
tag for
Latin words and <tok>
tag for other tokens like numbers.
For each wordform the appropriate morpho-syntactic information is encoded as
two attributes: aa
attribute, which contains a list of possible
morpho-syntactic tags for the wordform separated by a semicolon, and
ana
element, which contains the actual morpho-syntactic tag for
this use of the wordform.
The value of ana
attribute has to be among the values in the
list presented in the attribute aa
for the same wordform.
When the context determines only one possible value, it is added
automatically to the content of ana
attribute and thus the constraint becomes a
rule.
The document used for this demo is
Standart20030525constr.tag
.
The tokenizer is disambiguate
tokenizer which is in the demo
directory and can be loaded within the System. If you would like to see the token
types defined within this tokenizer, you have to select Tokenizers
item from the menu Definitions
.
The constraint is almost the same as the constraint described above. For this reason only the differences will be described.
In order to run the demo you have to perform the following steps:
Standart20030525constr.tag
is
loaded in the system. If it is not, then load it as it is described in Import XML Tool Demo. If the document is
loaded in the system, open it.disambiguate
is in the tokenizers
list. If it is not, then you can load it from Tokenizers
item from
menu Definitions
by Load Tokenizer
button.NO_SC
is in the filters list. If it is
not, you can create it as it is described in Filter Tool Demo.
Tools/Constraints/Value Constraints/Edit Value Constraint
.General
panel insert as a
Constraint Name
- disambiguate attributes
and select
Type of Constraint
- Some Attributes
. It means
that the constraint will work over the attributes of the selected element.Options
are the same as in the
constraint described above.
Target Specification
Target XPath
in Target
panel write
\\w[not(@ana)]
. w
is the name of the elements which the constraint will be
applied to - this is because different elements can have attributes with the same name. not(@ana)
determines all the elements w
that have no attribute ana
.Target Attribute
write ana
. This
is the name of the attribute which the constraint will be applied to.Source Specification
XPath/XML
in Restriction
panel
write the following XPath expression: @aa
. This XPath specifies the
source list for the constraint - all possible values for the target attribute
which are stored as value of the attribute aa
separated by
semicolons.The application of the constraint is the same as above. The query is saved in
the document attribute.cnst.que
in the demo directory.
The goal here is to create a very simple dictionary.
The document used for this demo is: const_demo.xml
. It only contains
very simple structures of word entries without any other
information. The file also contains a header
element with a list of
all part-of-speech category names allowed for the dictionary.
The set of constraints in this demo builds partial sub-structures automatically when the user supplies certain information. In the cases when the user must decide what information to supply, the constraints reduce (constraint) the choices as much as possible.
In order to run the demo you have to perform the following steps:
const_demo.xml
is loaded and opened
in the system. If it is not, then the user must compile the DTD from file:
const_demo.dtd
and then load the document as it is described in Import XML Tool Demo. If the document is
loaded in the system, open it.disambiguate
is in the tokenizers
list. If it is not, then you can load it from Tokenizers
item of the
menu Definitions
by Load Tokenizer
button.NO_SC
that skips semicolons in the
text is in the filters list. If it is not, you can create it as described
in Filter Tool Demo.const_demo.scr
. In order to load it, the user must choose menu item
Tools/Constraints/Value Constraints/Edit Value Constraint
. Then
choose Load From File
button and point to file const_demo.src
.
The constraints are loaded in the dictionary
group.
The user can apply constraint groups in two ways:
The user can apply constraint from Value Constraints Manager
or
Apply Constraints
dialog:
If the user applies the first method, he/she must:
dictionary
group and click Apply Constraints
buttonIf the user applies the second method, he/she must:
Value Constraints Manager
by Done
button -
in order to save all changes over the constraints.Apply Constraints
from the menu item
Tools/Constraints/Value Constraints/Apply Value Constraint
.dictionary
group from Add Group
button.
The above query is saved in document group.cnst.que
in the demo
directory.
Having completed successfully the above steps, the Value Constraints Engine
starts working. It starts processing the word entries one by one. For each word the
engine puts automatically a tag pos
and gives the user a list of
all part-of-speech categories (defined in the header of the document). Depending
on the user's choice, the engine adds other tags and asks for other information,
and so on.
In the end the result is a simple dictionary with relevant information for each word entry.