Eckhart Köppen, Gustaf Neumann
<Eckhart.Koeppen@uni-essen.de>, <Gustaf.Neumann@uni-essen.de>
Information Systems and Software Techniques
University of Essen
Universitätsstr. 9, D-45141 Essen
This paper presents an architectural implementation for web-based, active documents. Although several approaches for distributed, active documents exist already, we decided to establish a new model which provides more flexibility and interoperability without giving up formality. The model is based mainly on the Extensible Markup Language (XML
) and makes use of the Document Object Model, Cascading Style Sheets and the Intrinsic Event Model, which are all open standards defined by the W3 Consortium.
Keywords: XML, DOM, Browsers, Mobile Code, Active Documents
This paper was presented at the 7th Int. World Wide Web Conference, April 14-18 1998 in Brisbane, Australia.
The rapid success of the World Wide Web has led to a new class
of applications which are constructed using HTML
[Raggett (1997)] for the user interface and CGI
scripts for the application's logic. They have a more or less
strong resemblance to mainframe programs: the user enters data
into a form which is transferred to the web-server, evaluated
and the results are passed back to the user agent. As a
result, the computational load and the application logic are
located entirely on the server side.
In contrast to server-centered web applications, a client-centered application model has emerged through the use of scripting languages such as JavaScript [Netscape (1997b)]. Interfaces to the user agent and the current document exist in the form of plug-ins [Netscape (1997a)], Java applets [Gosling and McGilton (1996)] and embedded scripts [Levy (1996)]. However, with most of these solutions a number of problems exist: plug-ins are strongly tied to the chosen user agent and the client platform, the interface to the document is in all cases either non-existent or allows only the changing of text and there are still distinctions between client- and server-side application logic, making the design of applications which can operate on- and off-line difficult.
This paper present an approach for document-centered computing, which
promises more flexibility and removes the distinction between
server and client. The document-centered approach allows to embed
program and data within a (e.g. HTML
) document,
thus turning the formerly passive text into an application
itself. We will refer to these enabled HTML
documents as
active, hyperlinked documents (AHD
s). With this
document-centered architecture, a different application model can be
implemented. Typical applications range from small programs like a
bookmark page which controls its logic and appearance itself and go up
to workflow management systems which contain mostly independent
documents with different states and possible operations on them. More
generally speaking, the possible uses of AHD
s ranges from
controlling the contents and layout of a single document to support of
coordination and collaboration techniques. The goal of our
implementation of the model is to provide a means to develop and
evaluate different applications of AHD
s.
The general functionality which is needed by this
document-centered architecture covers several aspects: an
interface to the user agent, flexible structuring and semantic
annotation of the document (state, behavior, presentation),
introspection through an interface to the document itself, a
layout mechanism and a scripting language. Thus, our AHD
model contains the following building blocks:
Structuring, Layout, External and
Internal Access and Scripting. In an
implementation, the abstract building blocks have to be
replaced by concrete specifications and implementations.
Mainly, we will be using two standard proposals which have
been introduced recently by the W3 Consortium: the
Extensible Markup Language (XML
) and the
Document Object Model (DOM
) (see
[Bray et al. (1997)] and
[Byrne (1997)]) . The
DOM
provides a clear, programming language independent
interface to the document structure. Additionally, it defines
interaction with the user and the user agent through the
Intrinsic Event Model (IEM
). As a means to annotate
a document semantically, HTML
is too limited. Here, XML
will be used as the basis for the semantic annotations and as
a structuring language. Finally, the document layout is
described using Cascading Style Sheets (CSS
, see
[Lie and Bos (1996)]).
The techniques used in the basic building blocks of the AHD
model are: XML
(Structuring), CSS
(Layout), DOM
/IEM
(Access) and
Tcl/OTcl (Scripting). Note however that the use
of Tcl and OTcl (
[Ousterhout (1990)],
[Wetherall and Lindblad (1995)]) as the scripting language
is not a mandatory characteristic of an AHD
. Tcl/OTcl is
chosen here because it has a reasonable balance of simplicity
and capability.
The proposed model for AHD
s is embedded in a system
architecture which is a natural extension to the architecture
of current web-based applications. On the client side, the
user agent will be extended to incorporate a runtime
environment (RE
). The RE
provides an interface from
the user agent to the AHD
and vice versa. In addition, it
implements introspection and execution mechanisms for the
AHD
s. Related to the RE
are the presentation and
networking layers of the user agent, which handle the display
and transport of AHD
s. On the server side, the RE
will be
incorporated into the web server (this step will not be
discussed in this paper), so that certain functions in an
AHD
can be executed prior to the transfer of the AHD
to
the user agent. An AHD
could also reside in the web server
for a longer time, responding to requests itself.
Although this architecture resembles a typical agent meeting
place (see
[White (1996)]), our model goes further
by defining not only state and behavior of an AHD
but aiming
also at the end user through representation mechanisms like
CSS
(layout, aural attributes) and incorporation of a web
browser , thus letting the user interact with an AHD
. The
basic components needed to implement our model of AHD
s will
now be described.
The Extensible Markup Language (XML
) is very
closely related to HTML
and is a subset of the
Standard Generalized Markup Language (SGML
, see
[Goldfarb (1990)]). Like SGML
, XML
documents
are composed of physical units (entities) and have a logical
structure which is formed by elements. Elements are declared
in a Document Type Description (DTD
) and marked with
start- and end-tags in the document.
For our approach, the main advantage of XML
over HTML
is
the ability to declare elements which are needed to form an
active document and to describe state, behavior and
presentation separately. In comparison to SGML
, XML
is
explicitly targeted at the World Wide Web and more
widespread support from content creators and software
developers is expected.
The structure of HTML
and XML
documents can be accessed
through the Document Object Model (DOM
). The DOM
describes the components of a document in terms of nodes
which are organized in a tree. The more interesting node
types are text, elements and
attributes. The interface to the different nodes is
described via the Interface Definition Language
(IDL
) from the Object Management Group
[OMG (1997)]. An interface definition contains the
attributes and possible operations for the node
types. Corresponding definitions can be derived for various
languages, e.g. Java. Among the operations defined for the
node types are operations to create new elements, manipulate
their list of children and modify their attributes.
To map events to behavior, we use the Intrinsic Event
Model (IEM
) defined in the DOM
. Events are tied to
the element where they occur. If an element does not process
an event, it is propagated to its parent element. Any
activity is triggered by external events. They can be
grouped into user events and events caused by the runtime
environment. User events are pointer events (motion,
clicks), keyboard events (key pressed, key released), form
related event (list selections, text changes) and focus
changes. Events triggered by the RE
are document loading
and document unloading.
Since XML
defines only the structure of a document, a
separate layout definition is provided through style
sheets. The W3 Consortium proposes Cascading Style
Sheets (CSS
) to be used together with HTML
and
XML
. Style sheets imply a new set of attributes which are
associated with an element. These attributes control the
visible aspects of the elements, e.g. margins, borders, text
styles. In addition, a style sheet can also describe aural
properties for elements to enhance accessibility of a
document with text-to-speech programs.
To implement the infrastructure for AHD
s, basic definitions
are needed to describe the structure and semantics of an
AHD
. In addition, a mapping of the event model to the
document behavior has to be defined.
The structure of an AHD
is defined in a DTD
. To allow
the RE
to process an AHD
, two approaches are
possible. On the one hand, the DTD
itself could declare a
set of elements which are recognized by the user agent and
contain any code and data required for the AHD
. This would
require a very detailed description since any specific state
and behavior elements would have to be made proper
elements. In our model, we chose a more flexible
approach. In an AHD
, any element can contain code and
data. For this purpose, the DTD
declares only two specific
elements, namely <FUNC>
and <VAR>
. They will be explained in
detail in the next section. Instances of these elements are
always associated with their parent element, i.e. their
scope is the parent element. This association is established
by the RE
, which manages the access to code and data for
each element through access and modification functions. The
following sample illustrates the mechanism. The first part
of the example declares a DTD
with the elements <FUNC>
and
<VAR>
:
<!ELEMENT ahd o o ANY> <!ELEMENT func - - CDATA> <!ATTLIST func name CDATA #REQUIRED type CDATA #IMPLIED> <!ELEMENT var - - CDATA> <!ATTLIST var name CDATA #REQUIRED>
In the second part, two additional elements (<order>
and
<buyer>
) are declared. An instance of the <order>
element is
created. It contains the function print_header
and
instance of the element <buyer>
with two functions
(print
and duplicate
) and two variables
(name
and email
):
<!DOCTYPE ahd SYSTEM [ <!ELEMENT order - - (buyer | FUNC | VAR)*> <!ATTLIST order id CDATA #IMPLIED> ]> <!ELEMENT buyer - - (#PCDATA | FUNC | VAR)*> <!ATTLIST buyer id CDATA #IMPLIED> ]> <order id="o80"> <func name="print_header"> ... </func> <buyer id="p80"> <func name="print"> ... </func> <func name="duplicate"> ... </func> <var name="name">John Doe</var> <var name="email">doe@uni-essen.de</var> </buyer> </order>
The RE
associates both the functions and the variables
with the <buyer>
element. As a consequence, the variables
and the functions can be accessed only through the <buyer>
element. In this example, the <buyer>
element itself can be
addressed using its id
attribute. The addressing
scheme for elements is defined in more detail in the XML
specification
[Bray et al. (1997)].
The local scoping of functions and variables requires a
method to facilitate access to functions and variables. We
will use parent delegation to look up function and
variable elements. In order to access an element through its
name
or id
, the parent chain of the
current element will be searched. Function and variable
elements are associated with an element if they are a direct
child of this element. In the example, the variable
name
can be modified from within the function
print
since it is a direct child of the <buyer>
element. Likewise, the function print_header
is
visible from the function print
of the element
<buyer>
. Here, the lookup of the function is continued in
the parent of <buyer>
, <order>
. This element
contains the wanted script as a direct child.
The two elements which are used to incorporate state and
behavior into an AHD
are <FUNC>
and <VAR>
. They contain
either function code or variable values and are a logical
part of their parent element. It is important that any other
element which will contain these elements is defined
appropriately by including the <FUNC>
and <VAR>
elements in
the content declaration. Both elements have the CSS
display property set to none
so that they will not
be laid out and displayed.
Following the declaration of the <FUNC>
as seen in the first
part of the example, the element declares two attributes:
name
and type
. The name
attribute
contains the name of the function under which it can be
called, and the type
attribute describes the
MIME
-type
[Borenstein and Freed (1993)] of the
function (which is basically the programming language
used). The contents of the <FUNC>
element (i.e. the text
between the start- and end-tag) is the code of the
function. The name of a function has to be locally unique,
which means that the parent element does not have any other
function element with the same name as an immediate
child. If another function with the same name exists, only
the first function is considered.
The <VAR>
element has only one attribute,
name
. Like the <FUNC>
tag, this contains a locally
unique name for the variable. The contents of the <VAR>
element is the value of the variable.
The events defined in the event model are mapped implicitly
to element functions. Any function which name equals an
event name will be called when that event occurs. According
to the IEM
an event handling function can also be defined
in an attribute which has the name of the handled
event. Each event handler is passed additional information
about the event, e.g. the pressed key or the pointer
location.
An element consists of five components: an attribute
list, child elements, functions,
variables and textual contents. The
access to the attributes and the child elements from within
the RE
and and AHD
is provided through the functions
defined in the DOM
. These functions can also be used to
access the functions, variables and the textual contents of
an element, but to ensure the parent delegation
mechanism for function and variable access, convenience
functions are implemented as well. These are:
setVar: sets a variable value (the textual
contents of a <VAR>
tag)
getVar: gets a variable value (the textual
contents of a <VAR>
tag)
setContents: sets the textual contents of
an element. If the element contains other child elements,
they will be deleted (to achieve more control over the the
contents of an element, using the functions of the DOM
is
recommended).
getContents: gets the textual contents of
an element. Note that any child elements in the content are
ignored (e.g. the contents of the <p>
tag in
"<p>A <em>nested</em>
tag</p>
" is "A tag
").
To preserve the state of an AHD
, another function is
needed:
toText: returns a textual representation of
the AHD
. The resulting text is an XML
document with
additional style information. It reflects any modifications
made through DOM
functions or the functions exported by
the RE
.
Since the execution of an element function can not be
triggered by DOM
functions, the RE
exports another
utility function:
callFunc: calls an element function
The implementation of the model is mainly tied to the
implementation of the user agent. We developed the extensible
web browser Cineast
[Köppen et al. (1997)] to
develop and evaluate novel approaches like AHD
s. The main
part of Cineast is written in OTcl, this is
also why we chose Tcl/OTcl as the scripting language
for AHD
s. Cineast is currently running under
Unix
, but its main parts can be ported to other
operating systems and the model for the AHD
s is platform
independent.
The basis of Cineast is the prototyping environment
Wafe
[Neumann and Nusser (1993)]. It combines
Tcl as a scripting language with different widget sets such
as the Athena widget set or the Motif widget set. In addition,
other libraries are linked into Wafe, among these are
SSLeay
[Hudson and Young (1997)],
LDAP
[Howes and Smith (1997)] and
OTcl. For the implementation of Cineast, a special
purpose widget called Kino
[Köppen (1996)] is
integrated into Wafe to handle the parsing, layout and
display of XML
source text. It is implemented in C
to achieve high performance. The roles of Wafe and the
Kino widget in the different layers of the user agent are
discussed below.
Most of the functionality required in the RE
is provided
by the Kino widget. The main task of the Kino widget in
the RE
is to parse any source text and to maintain an
internal representation of the element tree. It also
implements all of the DOM
functions and the additional
functions needed to access the functions and variables of
the elements.
The Kino widget is made up of three components: the
Parser, the Layouter and the
Painter. In the RE
, only the Parser is of
interest. It produces a tree of parsed data
(PData
) elements. The Kino widget uses four different
types of PData
elements: a generic element, a
box element which can contain children, a text
element and an inset element which can be used
to insert other widgets (text entry fields, push buttons,
list boxes etc.) and images into the element tree. The most
interesting element is the box element. Besides its role as
a structuring element to contain other elements, it holds
the CSS
attributes. The PData
box elements correspond
directly to the XML
elements in the document, i.e. any
XML
element results in a PData
box element. Navigation
in the PData
tree is possible through the DOM
on the one
hand, on the other hand, the components of the PData
elements can be used directly through their C
pointers.
The Kino widget is extensible in two ways: the application
can register a tag callback, which is called
whenever a tag is encountered during the parsing process. In
this callback, the application can for example modify the
PData
tree. The second callback which is used handles the
execution of scripts. The Kino widget calls this
script callback whenever a script has to be
executed, e.g. when an event occurs or a script is called
via callFunc
. This makes the Kino widget
independent from the chosen scripting language because the
execution of a script is handled by the application, in this
case Cineast.
The presentation layer for XML
documents is implemented in
the Layouter and Painter components of the
Kino widget. They are not needed for the basic
functionality of an AHD
and can be omitted if the RE
is
to be incorporated into a web server. The Layouter works
together with a CSS
database. The CSS
database is built
during the parsing process and contains all style
information. Currently, the CSS
database is implemented
completely in OTcl. The Layouter positions any element so
that no calculations are needed for display by the Painter.
Our implementation handles most of HTML 3.2
,
including important features like tables, images and
forms. The internal layout model is box-oriented, so that a
PData
box results in either a block-level or inline
element according to the CSS
specification. In contrast to
a simple flow-oriented model, boxes can be nested. The box
model allows the realization of more complex layouts such as
tables. The current implementation does not support
incremental layout as well as absolute positioning of boxes
within a text flow.
The Kino widget is tightly embedded within the infrastructure provided by Wafe and the Cineast browser. The infrastructure requirements for the implementation of a browser are partly generic and partly Web specific. The generic requirements are usable in a wide range of applications. These generic components are provided either by Wafe or by libraries accessible through Wafe. For example the color and image management (for the image formats xbm, xpm, jpg, png and gif) or event dispatching, resource management, module management are directly provided by Wafe. Generic components provided libraries accessible through Wafe are for example the widget classes from OSF/Motif, basic networking facilities from Tcl, security algorithms and protocols from SSLeay, compression from zlib, etc.
The browser specific code is exclusively implemented in OTcl and is seen by Wafe as the Cineast source code, which is loaded by Wafe at startup time of the browser. The Cineast source code defines the typical browser semantics:
HTTP
, IMAP
, HTTPS
, access to
non-networked documents such as files), access control
(basic authentication)URL
-completion,
request life-cycle management (starting, redirection,
killing), monitoringIMAP
),
MIME
-type specific presenters
One of the most demanding tasks for implementing a browser
is to ensure that the browser does not block event
handling. Since most of the components of the browser are
not thread safe the multitasking is implemented through a
select()
based event loop. All interaction and I/O
has to be performed asynchronously. If an I/O operation
would block, for example the browser windows could not be
refreshed, or only a single transfer would be possible at a
time. Furthermore it would not be possible to provide an
incremental loading and display which are highly useful over
rather slow connections.
The Cineast source code logic is greatly determined by
this asynchronous event handling. As a consequence of the
asynchronicity various handler and callbacks have to be
registered to process for example incoming data from sockets
or to process user events in various windows and to
associate the events with the according tasks. Every handler
must have enough context information to continue a task in
the correct environment. The Cineast browser uses widget
IDs for the context selection for GUI purposes, in the other
cases OTcl objects. For example for every request that is
started a new OTcl object is created which registers on
the input side I/O callbacks to process the input data
incrementally and reports to the presentation objects
(typically image or HTML
/XML
text) important state
changes in its life cycle (when the object is created, a
MIME
type determined, some new data is available, end of
data is reached, the request was killed).
As stated earlier the RE incorporated in the browser provides the basic platform for implement the state and behavior components of AHDs. We will show now how a document centered application based on AHDs differs form purely server or client centered approaches. We will use the selection and purchase of goods over the Internet as an example.
As the central document, the catalog with the order form can
be identified and implemented with an AHD
. The life cycle of
the catalog AHD
begins with the transfer of the catalog into
the RE
of the user's web browser. The first event the
catalog AHD
will receive is the onload
event
defined in the IEM
. Upon reception of this event, the
catalog AHD
can update its contents, e.g. recalculate prices
of special offers etc. We assume that the user now disconnects
from the Internet and saves the catalog AHD
locally as a
file which is achieved through the RE
's toText
function (introspection is necessary for this task). The
disconnect is not strictly necessary but emphasizes the
strengths of document centered applications.
The catalog AHD
can later be loaded from the local file and
will reside in the web browser's RE
where it can react to
user input like entering amounts or selections of goods. When
the user is finished making his selection, he can save the new
state of the catalog AHD
and the order form again to a
file. If he decides to transmit his order to the online store,
the catalog AHD
generates an order AHD
. The order AHD
is
transferred to the server using a HTTP
PUT
request
where it will be saved in a file on the online store's web
server, ready for further processing.
Ideas for applications which are based on web techniques such
as HTML
and CGI
were presented early in the history of the
web
[Houh et al. (1994)], as well as client-side
extensions like
[Kaashoek et al. (1994)]. Our
proposed model for Active Hyperlinked Documents and
the implementation of the overall system differ in two key
points from existing approaches:
AHD
is independent of commercial influence
and easy to use and validate for others,AHD
-based
application.More important however are the next steps that have
to be made to prove the potential of AHD
s. First, a
formal model for the use of AHD
s in distributed
applications has to be developed. Starting points can be frameworks
like [Mühlbacher and
Neumann (1996)] where the potential of active documents as
tools for collaboration are discussed as alternative to legacy
systems. From the architectural point of view the distinctions
between user agents and web servers is not necessary. We will work
towards a clearly defined RE
that can be incorporating
into classical web server or other applications as well. Finally,
security issues are not addressed in this paper. We believe that AHDs
have a huge potential in electronic commerce (esp. when they are
combined with e.g. technologies like croptolopes [IBM (1996)]) or in intranet application in
combination with role based access control [Neumann and Nusser (1997)]. We see here a
huge potential in electronic commerce application where the presented
approach can be used for example to implemented active, intelligent
contracts supporting the negotiations and signing process.
[Bray et al. (1997)] T. Bray, J. Paoli, C.M. Sperberg-Queen: Extensible Markup Language (XML), W3C Working Draft, http://www.w3.org/TR/WD-xml, November 1997.
[Byrne (1997)] Steve Byrne: Document Object Model (Core) Level 1, W3C Working Draft, http://www.w3.org/TR/WD-DOM/level-one-core-971009, October 1997.
[Gosling and McGilton (1996)] J. Gosling, H. McGilton: The Java Language Environment, http://java.sun.com/docs/white/langenv/, May 1996.
[Goldfarb (1990)] C.F. Goldfarb. The SGML Handbook, Oxford University Press, Oxford 1990.
[IBM (1996)] IBM Corp.: IBM Cryptolope Home, http://www.software.ibm.com/security/cryptolope/, 1996.
[Hudson and Young (1997)] T.J. Hudson and E.A. Young: SSLeay and SSLapps FAQ (Draft), http://www.psy.uq.edu.au/~ftp/Crypto/, September 1997.
[Köppen et al. (1997)] E. Köppen, G. Neumann, S. Nusser: Cineast - An extensible Web Browser, Proc. of WebNet 97, Toronto 1997.
[Lie and Bos (1996)] H.W. Lie and B. Bos: Cascading Style Sheets, level 1, W3C Recommendation, http://www.w3.org/TR/REC-CSS1, December 1996.
[Netscape (1997a)] Netscape Communications Corp.: Plug-In Guide, http://developer.netscape.com/library/documentation/communicator/plugin/index.htm, May 1997.
[Netscape (1997b)] Netscape Communications Corp.: JavaScript Reference, http://developer.netscape.com/library/documentation/communicator/jsref/index.htm, October 1997.
[Neumann and Nusser (1993)] G. Neumann, S. Nusser: Wafe - An X Toolkit Based Frontend for Application Programs in Various Programming Languages, USENIX Winter Conference, San Diego, January 1993.
[OMG (1997)] Object Management Group: The Common Object Request Broker: Architecture and Specification, ftp://ftp.omg.org/pub/docs/formal/97-10-01.pdf, August 1997.
[Raggett (1997)] D. Raggett: HTML 3.2 Reference Specification, W3C Recommendation, http://www.w3.org/TR/REC-html32.html, January 1997.
[White (1996)] J. White: Mobile Agents, White Paper, http://www.genmagic.com/agents/Whitepaper/whitepaper.html, 1996.
Eckhart Köppen is currently a Ph.D. student at the Dept. of Information Systems and Software Techniques at the University of Essen, Germany. His working areas are Information Systems Modeling, Internet/Intranet Technologies and Software Engineering. He was born 1970 in Essen, began studying at the University of Essen in 1991 and received degrees in Information Systems 1995 and 1996. As part of his Master's thesis, he developed the Kino widget class, which provides an extensible means to display HTML text.
Gustaf Neumann was appointed Professor of Information Systems and Software Techniques at the University of Essen, Germany, in 1995. A native of Vienna, Austria, he graduated from the Vienna University of Economics and Business Administration (WU), Austria, in 1983 and holds a Ph.D. from the same university. He joined the faculty of WU in 1983 as Assistant Professor at the MIS department and served as head of the research group for Logic Programming and Intelligent Information Systems. Before joining the University of Essen, Gustaf Neumann was a visiting scientist at IBM's T.J. Watson Research Center in Yorktown Heights, NY, from 1985-1986 and 1993-1995. In 1987 he was awarded the Heinz-Zemanek award of the Austrian Association of Computer Science (OCG) for best dissertation (Metainterpreter Directed Compilation of Logic Programs into Prolog). Professor Neumann has published books and papers in the areas of program transformation, data modeling, information systems technology and security management. He is the author of several widely used programs that are freely available, such as the TeX-dvi converter dvi2xx and the graphical frontend package Wafe.