XML/TA library for glib2 and pcre

xmlg 0.7.18
unfinished documents:
NODEMODEL
XPATHDEF
CHECKING
HYPERCSS
TEXTMINING
pfe sources
pfe manpages
pfe docbook

 What is it?

The XMT/TA concept does split an XML file into its two components
being the pure text (TA = text array) and the markup and xmlish
comments about it (XMT = xml markup tree). It does allow you to add
semantic markups to existing lowlevel text - you can select
a subportion of the textarray using a basic xpath expression
which will return a pointer/length pair into a pure char array.
This char array buffer can be handed to a libpcre expression for
further examination and the results can be memorized in the xml
markup tree without a need to modify the textarray itself. This
scheme can be done recursivly like you would in highlighting
source code for represetion where you would first run a lex
algorithm followed by interpretation of the tokens and assigning
different markups to symbols in structs or functions and
possibly crosslink them with attribs in the XMT elements.

 Any experience

I did learn about this scheme while working on the XEE project
at http://dbis.informatik.hu-berlin.de- the xee project 
however stores the XMT-tree and TA-text on secondary media to
be a full-fledged xml database, see http://www.xmldb.orgabout
it. The XMT/TA instead uses just main memory and it is implemented
in plain C for being reusable as a fast-running component in
other projects doing data-mining in text files. Contemporarily
the perl language is used for most file inspections but it does 
not quite match into the xml world and it is harder to transfer 
into a nativecode binary. For the XMT/TA however the xml is the
natural input/output format - making it easy to chain into 
other tools that can handle xml files. And there are a lot of
them available.

 The operations

The textarray is a simple string - doing inserts/removals should
be kept as a rare operation. However, a single XMT/TA handle may 
have multiple XMT and TA parts - many XMT may work on the same TA
and many TA may exist. So if you want to examing an input text 
and create a report out of it then just create two XMT/TA parts
and append textparts to the output TA as you would otherwise
print to an output stream but just with adding the XMT parts into
the output XML tree - so when you are done with it then you can
examine the output XMT/TA in an extra pass (as for handling some
crossreferences) or even push the output XMT/TA into another
procedure that you defined earlier to transform an XMT/TA into
something else.

 The implementation

It is currently glib2 based but it could be any different as
the dependency is low - it's mostly used for simplicity of
developments and minorily for attracting developers who can
have an easier read into the sources of the XMT/TA. For the
reason of source integrating the project with other libraries
the common prefix is "xml_" for all the symbols around. Many 
functions and members carry names stemming from libxml2 or
glib2.

Guido Draheim, Berlin DE sol-iii, 9 Oct 2002 AD (at xmlg 5.x)