Implement, Gustaf! - The Kino XML/CSS processor

Eckhart Köppen (Eckhart.Koeppen@uni-essen.de), Gustaf Neumann (Gustaf.Neumann@uni-essen.de)
Information Systems and Software Techniques
University of Essen
Universitätstraße 9, 45141 Essen, Germany

Published in: Poster proceedings of the 8th WWW Conference in Toronto, May 11-14, 1999

1. Introduction

This paper describes an architecture and implementation of the Kino XML processor which is capable of parsing and displaying XML [Bray et al. (1998)] documents according to associated CSS1 [Lie and Bos (1996)] stylesheets. Kino supports a large set of HTML [Raggett et al. (1998)] tags to handle legacy HTML documents. Further requirements are the ability to handle the networking aspects of hypertext documents (i.e. the linking of documents) and the ability to handle client side scripting (see [Koeppen and Neumann (1998)] for details). Kino is implemented in the C programming language and consists mainly of a parser library, a layouter library and additional code to use it as a widget for X Window System and the X Toolkit.

2. Kino XML Parser

The parser is responsible for tokenizing the XML source text and building a parse tree. The parser has two separate layers: An event-based tokenizer and attached event handlers.

The tokenizer recognizes all XML constructs defined in the XML 1.0 recommendation [Bray et al. (1998)]. To handle occurrences of markup in the source document, callback functions can be registered with the parser. Those tag callback functions receive information associated with the encountered markup, including an attribute list and the element identifier. In the current sample applications (for an example, see [Koeppen et al. (1997)]) , the tag callback is used to handle HTML form elements and to insert the appropriate sub-widgets.

The tokenizer handles declaration of markup and entities as well and adds the resulting information to the root element of the parse tree. Entity references are expanded during the parsing process by inserting the entity value at the point of reference. Here, another important callback is used: the link callback is invoked whenever the parser has to dereference an external entity or document (e.g. documents referenced with an XLink [Maler and DeRose (1998)] which has to be activated automatically by the XML processor).

The construction of the parse tree is handled by the event handling layer of the parser. This layer is connected to the tokenizer via the tag callback. An important feature of the Kino processor is the ability to handle legacy documents not following the strict XML rules for well-formedness. Such documents are HTML documents where certain end-tags are omitted or empty non-XML elements are used (e.g. HR or BR). The Kino processor handles those elements by considering the CSS layout model, e.g. when a block-level box is encountered in the source document, any previous block-level box is closed. A very frequent case are for example HTML P tags, where the end-tag may be omitted.

3. Kino Layouter

The layouter component of the Kino processor arranges the elements within the parse tree according the CSS rules, using the style properties set during the parse process. Basis for the layout is the CSS box model, so the main task of the layouter consists of two repeating and possible recursive steps: Positioning the elements and filling them with text and nested elements.

The most difficult part of the layout process is the calculation of the extents and positions of the elements. According to the CSS layout model, three different situations exist:

Vertically stacked boxes (block level boxes) take up a fixed width which is calculated from the width of the surrounding box. Their height is dynamic and depends on the contained elements.
Floating boxes have either a fixed width (defined through the CSS width property) or their width has to be calculated using heuristics such as the minimum and maximum possible width. Their height depends on the contained elements.
Boxes which form the cells of a table have extents which are calculated using the constraints of the associated table. Again, heuristics are needed, especially for nested and underspecified tables. The perfect layout of nested tables would require an optimization process, where the available width is distributed among the columns of the tables.

In the current implementation, the layouter is able to handle the CSS1 layout model almost complete with restrictions regarding table layout. In the future, full CSS2 support is intended.

4. Conclusion

The Kino processor is a versatile tool to parse and display XML and legacy HTML documents with the ability to interpret CSS layout rules. Its architecture is designed to make the processor extensible during the parsing process, the execution of embedded scripts and the usage in a networked context. The extensibility is achieved through a callback interface, where an application program can register actions. It is available as part of the Wafe [Neumann and Nusser (1993)] prototyping environment http://nestroy.wi-inf.uni-essen.de/wafe/. In conjunction with the Wafe environment, it can be used to prototype a wide range of internet-based applications.

References

[Bray et al. (1998)] T. Bray, J. Paoli, C.M. Sperberg-Queen: Extensible Markup Language (XML) 1.0, W3C Working Draft, http://www.w3.org/TR/REC-xml, February 1998.

[Koeppen et al. (1997)] E. Koeppen, G. Neumann, S. Nusser: Cineast - An extensible Web Browser, Proc. of WebNet 97, Toronto 1997.

[Koeppen and Neumann (1998)] E. Koeppen, G. Neumann: A Practical Approach towards Active Hyperlinked Documents, Proc. of 7th World Wide Web Conference, Brisbane 1998.

[Lie and Bos (1996)] H. W. Lie and B. Bos: Cascading Style Sheets, Level 1, W3C Recommendation, http://www.w3.org/TR/REC-CSS1, December 1996.

[Maler and DeRose (1998)] E. Maler, S. DeRose: XML Linking Language (XLink), W3C Working Draft, March 1998.

[Neumann and Nusser (1993)] G. Neumann, S. Nusser: Wafe - An X Toolkit Based Frontend for Application Programs in Various Programming Languages, USENIX Winter Conference, San Diego, January 1993.

[Raggett et al. (1998)] D. Raggett, A. Le Hors, I. Jacobs: HTML 4.0 Specification, W3C Recommendation, http://www.w3.org/TR/REC-html40.html, April 1998.