unfinished documents:
pfe sources pfe manpages pfe docbook |
What is it? The XMT/TA concept does split an XML file into its two components being the pure text (TA = text array) and the markup and xmlish comments about it (XMT = xml markup tree). It does allow you to add semantic markups to existing lowlevel text - you can select a subportion of the textarray using a basic xpath expression which will return a pointer/length pair into a pure char array. This char array buffer can be handed to a libpcre expression for further examination and the results can be memorized in the xml markup tree without a need to modify the textarray itself. This scheme can be done recursivly like you would in highlighting source code for represetion where you would first run a lex algorithm followed by interpretation of the tokens and assigning different markups to symbols in structs or functions and possibly crosslink them with attribs in the XMT elements. Any experience I did learn about this scheme while working on the XEE project at http://dbis.informatik.hu-berlin.de- the xee project however stores the XMT-tree and TA-text on secondary media to be a full-fledged xml database, see http://www.xmldb.orgabout it. The XMT/TA instead uses just main memory and it is implemented in plain C for being reusable as a fast-running component in other projects doing data-mining in text files. Contemporarily the perl language is used for most file inspections but it does not quite match into the xml world and it is harder to transfer into a nativecode binary. For the XMT/TA however the xml is the natural input/output format - making it easy to chain into other tools that can handle xml files. And there are a lot of them available. The operations The textarray is a simple string - doing inserts/removals should be kept as a rare operation. However, a single XMT/TA handle may have multiple XMT and TA parts - many XMT may work on the same TA and many TA may exist. So if you want to examing an input text and create a report out of it then just create two XMT/TA parts and append textparts to the output TA as you would otherwise print to an output stream but just with adding the XMT parts into the output XML tree - so when you are done with it then you can examine the output XMT/TA in an extra pass (as for handling some crossreferences) or even push the output XMT/TA into another procedure that you defined earlier to transform an XMT/TA into something else. The implementation It is currently glib2 based but it could be any different as the dependency is low - it's mostly used for simplicity of developments and minorily for attracting developers who can have an easier read into the sources of the XMT/TA. For the reason of source integrating the project with other libraries the common prefix is "xml_" for all the symbols around. Many functions and members carry names stemming from libxml2 or glib2. Guido Draheim, Berlin DE sol-iii, 9 Oct 2002 AD (at xmlg 5.x) |