XML/TA library for glib2 and pcre

xmlg 0.7.18
unfinished documents:
NODEMODEL
XPATHDEF
CHECKING
HYPERCSS
TEXTMINING
 What is Hyper CSS (or HCSS for short)

The CSS (cascading stylesheets) standard specifies a class-selector
syntax like "markup .select" to be equivalent to the attribute
selector "markup[class~=select] - however it does specifically
reserve this syntax for HTML stylesheets only. We will use this
syntax to build CSS rules that allow us to translate an XML input
file with the help of a HCSS into an XHTML file - but it could also
be yet another XML file (probably one in "presentation" format).

The name "HCSS" does obviously stem from its primary use for 
converting XML input files into HTML output files - which is
the abbreviation for "HyperText MarkupLang". And so the term
HCSS can be spelled as "Hyper CSS" as well, or in full length
as "HyperCascaded StyleSheet". Which bears an attribution in
itself since HCSS transformations can be stacked.

 Basic Transformation of HCSS

First a look at a simple HCSS file:

H1 .greeting { color : red }
H1 .appendix { color : blue }

Using this HCSS on an XML file will make the HCSS processor
look for XML elements named "greeting" and "appendix". It
will ignore elements named "H1" by the way.

<greeting> Welcome! </greeting>       <!-- shall be H1 -->
<appendix> Already Done </greeting>   <!-- shall be H1 -->
<H1> some definition </H1>            <!-- yet unkonwn -->

The HCSS processor will recognize the elements in knows from
its hCSS file and transfer them into their target markup, all
others are ignored and transferred untouched leaving some
later viewer to interpret those markup nodes. The output
file body will therefore look like this:

<H1 class="greeting"> Welcome! </H1>
<H1 class="appendix"> Already Done </H1>
<H1> some definition </H1>

When you attach the hCSS file as an html css then you will
instantly observe that an css-enabled html browser will 
print the "greeting" header in red and the "appendix" header 
in blue color while printing the definition text in the
default header color that it knows for H1 header markups.

 The SPAN/CSS warning

The CSS standard does specifically warn about the abuse of
class-attributes to be used to infer a completely new 
presentation for HTML text. That's part of the fact that
one can write a CSS file to omit output marker supertype.

.greeting { color : red ; font-size : big }
.appendix { color : blue ; font-size : big }

This will still allow us to transform the xml input text
into valid HTML output that can still be viewed with an
css-enabled browser - and still the text is printed in
a header-like fashion in a bigger fontsize. However, 
most third-party tools are know completly doomed since
the slightest remainings of "semantic markup" (i.e. a
"header line") is removed and only presentations hints
have survived:

<html><head><style>
  .greeting { color : red ; font-size : big }
  .appendix { color : blue ; font-size : big }
</style></head><body>
  <span class="greeting"> Welcome! </span>
  <span class="appendix"> Already Done </span>
  <H1> some definition </H1>
</body></html>

For this complete removal of semantic markup the CSS
standard has taken up that warning into its standardization
document. With HCSS however you are still enabled to 
inflict an "output markup" that has an a semantic
interpration based on the semantics expressible in the
target XML format - which is usually XHTML and the
semantic markups that it knows about, like "<em>" or
"<address>" or "<blockquote>".

A HCSS writer should be aware that those output markup
elements do have a default presentation information 
attached that one will only add to / override with
addtional properties in the CSS lines. Just like some
H1 is already shown in a "big" font-size and we add
another color. The CSS would however even allow to
override it - like writing:

H1 .fineprintsection { color : grey ; font-size : xx-small; }

 Renaming featuring

Most transformations of XML into XHTML will want to replace
some specific XML element into a presenetational HTML markup,
and some markups of the input XML might already be some
HTML markup - which the user wants to leave untouched. A
HCSS can still attach additional properties for XHTML
markup that shall be transferred through.

   H1 { font-style : italic }

While this syntax is quite intuitive from an XHTML view, in
the HCSS world it is actually equivalent to writing a
proper line with input' xml element and output' html markup
like this:

  H1 .H1 { font-style : italic }

It simply means that any input-element "H1" shall be
converted into an output-markup "H1" and some output
css subset table shall be expanded the properties given 
(more on css subset generation later). This internal
handling does even allow a HCSS writer to rename valid
XHTML-registered xml-elements into some other XHTML
markup element - like transforming <em>phasis into
<b>old with, or just making a H1 header into a lower
ranking H4 header using:

<html><head><style>
  .greeting { color : red ; font-size : big }
  .appendix { color : blue ; font-size : big }
  H4 .H1    { } /* renamed ! */
</style></head><body>
  <span class="greeting"> Welcome! </span>
  <span class="appendix"> Already Done </span>
  <H4 class="H1"> some definition </H4>
</body></html>

 XHTML default markup

When the HCSS does not show an output markup type
for an xml element then the HCSS processor has to
guess one - for XHTML the general default is "span".
This default markup corresponds to the presentation
default of a CSS line containing no "display"
attribution - a missing "display" attribution makes
it "display : inline" actually.

On the other way round, a HCSS will have a look at
the "display" property for making a better guess
at the output markup. For XHTML a predefined
mapping table can be used that will usually choose
between "<span>" for "display : inline" and "<div>"
for "display : block" - being the most common
display-attributes in a CSS text for markups.

Other guesses would be to transform a "list-item"
into <li> markups and the complete set of CSS2
"display : table-*" values into a well-chosen set
of output XHTML markups. The "run-in" and "compact"
values however are hard to guess as they can be
either one of <span> or <div> depending on context.

Note that internally the hCSS file is parsed into
a single table indexed by the "class" selector name.
The markup-leader is transformed into a property
item named "html" (for html output) or "docbook"
for (for docbook target type). You can override
this with the help of an explicit such property.

 H4 .appendix { color : blue ; docbook : appendix }

Note that the HCSS will actually spit out markups
with samenamed class and markup name in a shorthand
form (as default - it can be overridden). The above
line would override the CSS spec for a "docbook"
target as if it was written as:

 appendix .appendix { color : blue }

which itself is the same as "appendix { color : blue}"
and which will therefore make an input text containing
a "<appendix>" xml element to pass that one through
to the output docbook target. No superfluous "class"
would implanted!

in "docbook" output:
  <appendix> output text </appendix>

in "html" output:
  <H4 class="appendix"> output text </H4>

 Generation specials

Some html browsers do not understand all CSS properties
for all html markups - for instance it is not that
specifically easy assign a background-color to a <pre>
text. However, one can often fake a correct result by
processing an input-element into two output markups
even without having an explicit :before or :after or
something like that. That's a generation special, so
a CSS like

 .sourcetext { background-color : light-blue ; html : pre }

will make for an output text being someting quite but
not entirely unlike

 <pre class="sourctext"><span class="sourcetext"> 
   #define A b
 </span></pre>

 the before/after specials

  { add documentation here }

 CSS subset generation

While the hCSS transformation specification can actually be
used directly as an external LINK reference for a css-enabled 
browser, it is often more convenient to place the CSS style
lines directly into the output file. It has the advantage
that the output file has a compact single-file format that
will always be presented in the same form even when the file
is copied and mailed and the receiver does not have the
original CSS file at hand - as would be a problem for a
thing like this being read in offline-mode:

<html>
 <LINK href="http://my.site/style/text.css" type="text/css">
 <body>
    <span class="sourcetext"> #define A B </span>
  <body>
</html>

Instead want to embed the stylesheet properities directly
into the output file and that is actually the default for
XHTML output. It was shown in the examples above already
as we used <style> markups there to show the css lines in
the target text.

As an extra feature however the HCSS processor will 
memorize all class-names being used in the input file
and which therefore need to appear in the outputs
style-section. Every HCSS property line being not 
used in the current input xml file will not be copied
into the output file.

  H3  .greeting { color : red ; font-size : big }
  H3  .appendix { color : blue ; font-size : big }
  pre .sourcetext  { background-color : light-blue }

but only one line of this HCSS file makes it into the
resulting output file when it is like this:

<html><head><style>
  .greeting { color : red ; font-size : big }
</style></head><body>
  <H3 class="greeting"> Welcome! </H3>
  <H1> some definition </H1>
</body></html>

Likewise you could create a second output file that
does only contain the css entries being used in the
current transformed document. It allows us to invent
a huge default HCSS specification but the resulting
presentational css might be quite short.

 Final Note

A non-expert user of XML has often problems in 
writing a transformation script XSL. Most of the
times one only wants to transform an input XML
text into an output xml-like text like xhtml or
docbook for final presentation. Making up an XSL
script seems overdone for this to most people.

The HCSS allows a user to write a transformation
specification based on CSS syntax. This syntax
is much more intuitive to many people and it is
already known to those who have used HTML in
the past only to make up their www documents.

The HCSS processors allows one to easily get a
foothold in the XML world. It will suffice for
a lot of tasks where one writes a personal
text in XML with only semantic markups and
transforms them into a presentation xml-type
format with (mostly) presentational markup 
by the help of a simple thing as a CSS spec.