unfinished documents:
pfe sources pfe manpages pfe docbook |
XPath and shorthand The XPath specifications created by the W3's XML directory is actually a retrievial language. It defines a set of check functions that must match to point to make a node selected. In most cases it will be an element-name (i.e. select-from) with logical operatoins on the (where-attrib==value). The XPath documents however show an abbreviated syntax which is a lot easier to implement. Some implementations may also call these "basic XPath"s. Here is a little introduction to this scheme. First of all, no functions exist in themselves only the abbreviations exist - a later processor may stack on top of the basic syntax by handling functions being put into [...] parts. The xmlg's basic xpaths do never contain [...] other than numeric ones, i.e. [number]. "A" ... matches all nodes being direct subnodes of the current node with a name of "A". "*" ... matches all children of the current node. A/B ... matches all nodes with name "B" being a child of nodes with name "A" of the current node. */B ... the nodes named "B" at secondary level under the current. //B ... matches all nodes named "B" at any depth under the current node. //A//B ... any nodes "B" at any depth inside "A" nodes which may be found anywhere under the current node (and its tree). B[2] .. if there are multiple "B" nodes then only the second one will match. Omitting a [number] specification on a single-node lookup wil return the [1]st node of course. B@href .. from multiple nodes "B" only those are selected which have an attribute "href". //*@href .. any nodes in the tree at any depth that contians an attribute "href". Note that the uppermost node contiang such an attribute will be found even if lower nodes have such a href too. All basic xpaths are non-greedy. strstr matching The basic xpath model is expanded in xmlg by matching the names of nodes with an expression syntax. The most basic xmlglib has a simple extension of searching with a "strstr()" function when the first character of a namespec was a single "*". That is a select of //*A//*U will also match <BLAFF><HOO><BLUBB>... nodes. Of course this fits well within the xpath abbreviations where a "*"-star matches any. This strstr-matching with an empty strstr-spec will likewise match all nodes since every node name has a subportion being zero-length. The same style of strstr-matching is used throughout the library - on the xpath syntax it also refers to strstr-style matching of attribute-names. The exact-matching style (i.e. strcmp-style) is the default and strstr-style selected with a leading "*" in the match-expression part. xpath comparison The most common usage of xpath expression takes the abbeviated syntax that is largely similar to a filepath specification. In xpath each of the selects can be followed by an assertion check. None of these assertion checks are implemented for the selection processing here. However, we can compare the formats that are provided. For example, the form "//A" is equivalent to an xpath selection expressed as "//*[name()='A']" - in wording it says to select all nodes with a name equal to 'A'. Likewise we have to note that the attribute-assertion above has actually to be put into such assertion checks, i.e. "//*@id" is equivalent to an xpath selection of "//*[@id]". The strstr matching above has an equivalent in xpath selection expressions as well - a query of "//*AA" is equivalent to "//*[contains(name(),'AA')]". OTOH, I am currently not aware of any xpath selection being equivalent to "//*@*AA" but you got the point. xpath extensions It would be nice to implement a number of functions for the xpath postfix bracket-syntax, something like "//H1[not(@id)]" would be nice to select all nodes of name H1 that do not yet have any "id" attribute. Others could be useful as well, as the count() or atleast logical operators to follow up some functions, i.e. "//*[count() < 5]". OTOH, I do not intend to implement the prefix axis specifications at any point - the "/" is said to be equivalent to an axis specification of "/child::" and "//" is said to be equivalent to "/descendant-or-self::". The others like "following-sibling" can not be expressed, hopefully most of it can be expressed in terms of forward-assertions on a postfix bracket syntax. pcre matching About all xpath routines have a cousin in the xmlpcre part of this project - there all perl-style regular expressions are applied - and including a handy name1|name2 syntax. Per default the pcre must match the complete name (i.e. the default is in fact "^(?:name1|name2)$") which can again be reduced to match any subportion by giving a leading "*". The style of name-matching is defined more in-depth in a later document. Of course, the PCRE machine will not see a leading "*" on such a pcre-name match, it just gets different modifier flags. The `make check` shows a lot of example how xpath's may look like in xmlglib and its xmlpcre part. a warning herein: the "|"-syntax of PCRE does somewhat interfere with the xpath selection syntax - there it separates entire paths, i.e. "//A/B|//A/C" in xpath .vs. "//A/B|C" in pcre matching. In the xmlpcre matching part we always make a regular expression match on the name part, not the path as a whole. not yet implemented It would be nice to add the "=" syntax all about the xpath selectors, i.e. name=text matches nodes with "name" having a text-content of "text", and more importantly all the *@attrib=value would match with any node having an attribute of the given name that furthermore has the given value. It is of course possible then to use PCRE expressions to do the selection-match, not only strcmp-style matching. |