27th юни 2017

XPath Implementation Package

XPath (Version 1.0) Implementation Engine

The package is designed and implemented by:
Alexander Simov
Milen Kouylekov
CLaRK Development Team
BulTreeBank Project
Download the package here:
xpath.zip (139 215 bytes)

Important: This package is free for research uses. For commercial use, please contact us: CLaRK Development Team.
Copyright: CLaRK Development Team.

For this package, the licence for the CLaRK System holds.

This implementation originally is designed to be used in the CLaRK System although it proved that it can be used as an independent module. The implementation is based on the XPath W3C Recommendation available at: http://www.w3.org/TR/1999/RE C-xpath-19991116. The implementation covers completely the specification except for one function id() (This fuction requires a DTD for the target XML document). There are several additional features implemented: variable bindings and usage, additional nodetests for text node selection. The implementation is supposed to work with tree structures of Document Object Model (DOM) Level 1. Specification Version 1.0. W3C Recommendation.

When the XPath engine is used in the CLaRK System, there are much more features available: querying external documents, using regular expressions for text searching.

The syntax of XPath expressions which use variable bindings can be described in the following way:

   VariableXPathExpression   :: =   VariableBinding* XPathExpression

   VariableBinding  ::=   '{' VarName ':=' XPathExpression '}'

Here VarName is a String which can not contain whitespace spaces and the sequence ':='. This String stands for the name of the variable. One variable can receive value only once. The value of each variable can be used in each XPathExpression which follows variable’s binding. The usage is as it is stated in the recommendation: the variable name preceded by the dollar (‘$’) sign. The XPathExpression is a valid XPath expression according the recommendation which does NOT contain other variable bindings in it. The context node used when a value for a variable is evaluated is the same which is used for the whole XPath expression.


            {myVar := /contents/item[5]/text() } //article[ head/text() = $myVar ]/body/text()

Here the variable myVar will receive a value which is the text of the fifth item child element of element contents. The result from the evaluation of the whole expression will be the text of body of an element article which has an equal text of its head child to the variable value.

This distribution package contains automatic Java generated API (Application Program Interface) of the clark.xpath package, which contains the implementation. In order the XPath engine to work properly we recommend extracting the zip archive as it is a separate directory without changing the internal sub-directory structure. The only directory which is not required is doc/ which contains the API of the implementation. It is delivered only for user (developer) support.

Here we give short description how this XPath implementation can be used. The main class which can be used from external classes is MainXPath. It offers several important methods which will be described below. The XPath engine uses a DOM tree which should be completely stored in the memory. The process of evaluation of one XPath expression passes through two independent stages: parsing the XPath expression which produces an internal structure; evaluating the new produced structure on a context node which produces the final result. The intermediate structure is accessible from external packages and can be reevaluated again. This is an usefull speed optimization for multiple applications of one and the same XPath expression.


static clark.xpath.GeneralNode compile(java.lang.String expr)

          This method parses the XPath query and produces a special tree structure (GeneralNode) corresponding to the query. This structure can be evaluated many times without parsing again the original XPath query. The evaluation of a GeneralNode tree can be performed by method execute() or executeCanonical(). If you need to apply the XPath query only once it is better to use method processXPath() or processXPathCanonical().

static java.util.LinkedList execute(clark.xpath.GeneralNode tree, w3c.dom.Node context)

          This method evaluates a tree structure of type GeneralNode (generated by method compile()) according to a context DOM tree node. The result MUST be a node-set (LinkedList). Otherwise the method will throw a ParseException with an appropriate message. If you need the return result to be one of the standard for the XPath types you have to use method executeCanonical().

static clark.xpath.AtomicObject executeCanonical(clark.xpath.GeneralNode tree, w3c.dom.Node context)

          This method evaluates a tree structure of type GeneralNode (generated by method compile()) according to a context DOM tree node. The result can be one of the standard XPath types: node-set (LinkedList), number (Double), literal (String) and boolean (Boolean). The returned result of this method is of type AtomicObject which has two important methods: getType() – returning the type of the value of the result; getTarget() –  which returns the real value returned from the evaluation. Instead of using getType() it is possible to be used the Java predicate ‘instanceof’ to determine the type of the getTarget() result.

static java.util.LinkedList processXPath(java.lang.String expr, w3c.dom.Node context)

          This method does a composition of the methods: compile() and execute(). It stands for user’s(developer’s) ease. The result from this method is the result returned by method execute(). The method compile() prepares the input for execute().

static clark.xpath.AtomicObject processXPathCanonical(java.lang.String expr, w3c.dom.Node context)

          This method does a composition of the methods: compile() and executeCanonical(). It stands for user’s(developer’s) ease. The result from this method is the result returned by method executeCanonical(). The method compile() prepares the input for executeCanonical().