Home
Description
Publications

Available Resources
Text Acknowledgements
Related links


Events


CLaRK System

CLaRK System Online Manual


Bulgarian dialects'
electronic archive




eXTReMe Tracker

 

 

 

 

 

 

 


Menu Trees

Intersect Structures

This tool represents a means for comparing tree structures. It calculates the maximal tree structure which is presented in each of the input structures. The result structure (the intersection) is opened in the system editor. The differences and the places where they appear are also preserved in internal structures. On a later stage this information is used to support the user to find the correct target structure by expanding manually the intersection. When an intersection document is opened, the system switches to a special performance mode and some of the tree restructuring functions are disabled. Also some new tool specific functions appear. The structure difference data turns into constraints and the user expands the intersection by choosing pop-up menu items instead of manual typing. Every time the user takes a decision by choosing an item, this information is spread on the whole result document the intersection is extended.

In order to compare (intersect) tree structures, the user has to point out the source structures. They can be located in one document or in more than one. The topmost node of each structure to be processed within a document is pointed by an XPath expression. If each document contains a single structure the XPath expression selects the document's root (i.e. /self::*).

The initial source selection dialog has the following appearance:

In case the source document(s) is/are not internal one(s), the dialog is switched to External Source mode:

The XPath expression in field Subtrees (XPath) which is presented in the two source modes is evaluated on each of the selected documents. Each single result node determines one tree structure for comparing.

Having the structures compared, the result intersection document is show in the editor. The next step is this structure to be extended manually (on the basis of constraints) until it covers a single structure, or a certain subset of the initial structures (depending on the task). The nodes in the intersection tree where there are differences in the different structures (i.e. the choice points) are marked with a star '*' in the end of the name/value. Performing a right mouse-click on such a node, will activate the constraints for this node and a choice dialog window appears (instead of the standard tree menu).

Here there are five possible types of differences among the input structures for a selected node from the intersection:

  • attribute - attributes which either do not appear or they appear with different values in all corresponding nodes;
  • content - nodes (elements, text or comment nodes) which do not appear in all corresponding nodes' content;
  • parent - nodes (elements) with different names represented as parent nodes of the corresponding nodes;
  • preceding sibling - nodes (elements, text or comment nodes) which do not appear as preceding siblings (direct or indirect) for all corresponding nodes within the same parents;
  • following sibling - nodes (elements, text or comment nodes) which do not appear as following siblings (direct or indirect) for all corresponding nodes within the same parents.

Here follows the layout of two example dialog windows for selecting an item:

Basically, the dialog represents a list of items, possible for the specific selection in the tree. The list content is predetermined by the mode in which the dialog is set. The mode corresponds to the type of differences which are currently under consideration (attributes, content, parents, etc.). If for a certain node there is more than one type of differences, the user can change the dialog mode with the arrowed buttons panel in the bottom left corner.

The modes are as follows:

  • - the dialog shows a list of all attributes (and their values) which can be set to the current node. The attribute names are shown at the top of the window in a list. Depending on the selected item in this list the content of the main list is populated with all the possible values for it.
  • - the dialog's list shows all the possible nodes which can be inserted as children of the current node. They can be elements (wrapped in '<' and '>'), text or comment nodes. Each list entry ends with a number in brackets. This number indicates in how many of the input structures this item exists.
  • - the list shows all the possible nodes which can be inserted as a parent of the current node. They can only be elements. Each list entry ends with a number in brackets. This number indicates in how many of the input structures this item exists.
  • - the list shows all the possible nodes which can be inserted as children of the parent of the current node in any position before the current one. They can be elements, text or comment nodes.
  • - the list shows all the possible nodes which can be inserted as children of the parent of the current node in any position after the current one. They can be elements, text or comment nodes.
Whenever a mode is not appropriate for the selected node, the corresponding button is disabled. The rest of the components in this dialog window are:
  • Match: n of m - an indicator of the current selection in the list. It shows how many input structures will still cover the intersection if the selected item is included. Here n is the number of the structures which contain the selection and m is the number of all structures which cover the current intersection.
  • Approve - confirms that the currently selected item will be inserted in the intersection and thus reducing the number of structures which cover it. This leads to extension of the result structure.
  • Exclude - rejects the currently selected item and thus excluding all structures which contain it. This leads to reducing the structures which are covered and extension of the intersection.
  • Cancel - closes the dialog without any changes.

A run-time information about the current state of the intersection can be received by performing a right mouse- click on a node which is not a choice point (not marked with a star '*'). Then the following information dialog appears

The dialog shows information about how many input structures still cover the intersection (field 'Active trees'). Also there is a list(table) containing information about all choice points (i.e. nodes in which the intersection can be expanded by user decisions) in the current intersection.

The choice points in the table are sorted in document order (the order the nodes appear in the intersection document). For each point the information shown is: the position in the list (Doc Par); the node name for elements and the value for the rest (Node / Value); the number of choices for attributes ('Attr Diff'); the number of choices for children (Child Diff) and the number of choices for parents (Parent Diff).

The functionality here is:

  • Resolve - takes the selected choice point from the table and selects the corresponding node in the tree. Then, the dialog window for selecting an item is shown. The information dialog window is closed.
  • Close - closes the information dialog window without any other actions.
  • Mouse double-click on a certain table row - performs a Resolve action for the selected row.

During the process of expanding the intersection if the number of covering structures becomes 1, the system shows an information message and switches the intersection document to a normal mode of processing.

Tree Comparing Algorithm

The construction of an intersection is incremental. It starts from the root of each structure and tries to find as many common nodes as possible, moving down to all input structures. When a common node for all structures is found, it is inserted in the result structure. The algorithm combines a top-down with a bottom-up matching strategy, i.e. the construction of the intersection starts from the root to the leaves and when it is no longer possible to find nodes in common which are directly assigned to the structure the system tries to find common nodes on lower levels discarding some nodes on the paths to the roots. When such common nodes are detected they are attached to the first common grand parents from the result structure (at worst, to the root).

There are different criteria for finding common nodes. The main factor in matching is the node names for elements; the string content for text and comment nodes and names and values for attributes. The contexts of the nodes in the structures (parents, children, siblings) are also very important. The nodes matching module is aware of the Element Features defined in the corresponding DTDs. This is crucial when within a structure there are element nodes with equal names, children of the same parents. Then a distinction is needed to prevent incorrect structures matching. With the help of well defined Element Values the processor is enabled to 'see ahead' and to make the correct matching.


Suspend Intersection

This item can be used when the current document in the system editor is in a process of intersection construction. This function suspends this process and makes the intersection structure an ordinary XML document. All the data related with the intersection (constraints based on the structure differences) is removed. The document structure is preserved without any changes as it was before the selection of this item. All editor's functions which were disabled during the intersection are available now. This function is useful when the current intersection already contains the information the user is interested in and no further expansion is needed.

Complete Structure

This item can be used when the current document in the system editor is in a process of intersection construction. This function discontinues the process by reducing the covering structures to one and makes the intersection structure an ordinary XML document. Having done this, the intersection is expanded to the only structure which remains and thus it completes the process. The structure to which the intersection will be expanded is selected randomly from the structures which still cover the intersection. This function is useful when the current intersection already contains the information the user is interested in and the rest of the data is added just for definiteness.


Save Current Structure Set

This item can be used when the current document in the system editor is in a process of intersection construction. This item does not interrupt the process. It saves all structures which cover the current intersection in a separate document. The structure of the new document includes a special root node (named intersectionSet) to which all active structures are attached as children. This function can be used in cases when no further expansion of the intersection can be done (for any reason) and the current work has to be saved. Having saved the structures in a document, if later the process of intersection searching has to be recovered, this document can be used as a source where each child node of the root represents a root node for one structure. In this way the former intersection which was constructed when the structures were saved, will be recovered automatically.

The usage of this function does not stop the intersection finding process and the user can proceed with expanding the result structure after this item is used. Thus, this can be used as a backing-up mechanism.