XTech 2005: XML, the Web and beyond.

xfy Technology

Copyright 2005, Copyright is held by the author/owner(s)

Discuss this paper on the XTech wiki
View XML source for this paper

Keywords

Abstract

Industry has long appreciated the utility of XML for server-side business logic on enterprise systems. Yet this alone cannot fully account for the rapid proliferation of XML standards and vocabularies. XML is just as well-suited for the organization, validation and presentation of client data. In fact, it is an indispensable information management tool for users at all levels.

Unfortunately, the tools available to most users are blunt instruments indeed; e.g., HTML browsers whose rough-hewn rendering of the original, rich XML data sources reveals little of the designers’ original art. To be sure, some client tools exist that can process XML in close to its full complexity: there are a few Michelangelos among the multitude of crude stonemasons. Yet this precision is achieved at the expense of generality. Current client-side XML applications work well only with the small subset of XML vocabularies for which they were designed.

This falls far short of fulfilling the potential for data abstraction and unlimited extensibility inherent in the design of XML. In this paper we introduce xfy, a generalized XML platform designed to reach that potential: a DaVinci among Michelangelos. xfy is a framework that gives users the opportunity to do creative work with structured data on the client side. This is the key to reuse, re-purposing, and release from the straightjacket of format-dependence that only harnessing the full power of XML can provide.

1. Introduction

Have you ever bought an 8-track tape, backed up data on a 3.5” floppy, or designed an SGML document? If so, you have learned one inescapable fact about technology: data outlives format. And the larger the enterprise, the bigger the problem this presents. With its separation of content and presentation and its capacity for abstraction, XML has the potential to overcome this major hurdle in enterprise system development. Other advantages include the composite structure of XML documents, which enables access to arbitrarily complex compound documents at any level of granularity, and the infrastructure for sharing XML encapsulated data over the web provided by standards like SOAP and UDDI.

Published XML standards form a common language for information management system developers and, as such, are a popular means of bootstrapping enterprise systems. Yet something of the native flexibility and extensibility of XML is lost when vocabularies are fixed by a particular instantiation of a standard. Real world applications handle standard vocabulary sets in one of two ways.

The first involves developing a custom application that allows users to work with documents conforming to a certain standard (or family of related standards). This solution works well within a narrow domain; it is sufficient for end-users whose limited needs can be served by a fixed set of XML vocabularies.

The second avoids the application development issue entirely by converting the XML to HTML (or, at best, XHTML), and deploying a web browser on the client side. This solution is certainly easier for developers, but end users lose all the benefits of XML. XSLT can partially salvage this sorry situation by providing users with a richer representation of the source XML – as designed by the author of the script. It may look impressive, but it is a fixed, read-only display that serves the purposes of the designer rather than those of the user.

In either case, something important is lost in translation. Even worse, these practical concerns tend to limit the imagination of standards designers, whose frameworks must be implemented using technology subject to these limitations.

The solution we propose is xfy – a generalized user interface for arbitrary XML. Its unique architecture avoids the pitfalls mentioned above not by adding a new proprietary layer onto XML, but rather by taking full advantage of the flexibility inherent in the XML standard itself.

Applied to the enterprise, xfy goes far beyond integration of standard and proprietary vocabularies in individual applications. Its native mechanism for real-time, enterprise-wide coordination of data and business logic embraces the complexity of a dynamic, rapidly-evolving community of users. xfy encourages rampant reuse and re-purposing, unleashing the creativity of end users and saving critical business data from certain obsolescence.

2. Requirements

Beyond the minimum standard of XML Schema validation, we see five additional features the ideal XML platform for enterprise users must have.

2.1 Support for Arbitrary XML Vocabularies through Extension

The rapid proliferation of XML standards has made it impossible to claim that any fixed set of vocabularies is sufficient for current enterprise needs (not to mention future needs). It is not even possible to predict the full set of XML vocabularies a single given enterprise will need. Yet XML itself is eminently extensible; couldn’t applications use this hallmark of XML to address the problem? They could, and they should; yet most do not.

In principle, XML is designed to be extended without any customization of the grammar. A sufficiently flexible XML processor could exploit this fundamental and powerful feature to handle entirely unknown vocabularies.

2.2 Support for Creating or Importing Vocabulary Editors and Associating Them with New Vocabularies"

It is one thing to read and interpret a new vocabulary, and quite another to develop a user interface for editing documents that use it. One hopes that someone else has already performed this service and that the editor is either freely available or supplied by the author along with the new XML data. In either case, the ideal XML processor should readily incorporate it or, at worst, allow partial reuse of the editor for a similar vocabulary. Without this extensibility, we are reduced to displaying and editing the unknown XML as plain text, shorn of all semantic content.

2.3 Exploit Namespaces to Support Arbitrarily Complex Compound Documents

Several XML standards, including SOAP, XML-Signature, and XML-Cipher, are designed as wrappers for generic XML. Other XML standards, such as SVG and MathML, are designed to be embedded inside other XML documents as necessary. The namespace mechanism enables XML to achieve this flexibility.

For consistency and ease of implementation, industry standards are generally carved in stone. Nevertheless, namespaces provide a built-in partitioning mechanism that guarantees parsers will accept new vocabularies dynamically, and at any level – even if they are entirely unknown.

This enables document authors to insert special purpose nodes (XML-Signature, for example) at any granularity required for enterprise information management, as well as to combine standard and proprietary vocabularies in a single compound document.

2.4 Exploit XPATH to Share Data References among Loosely Integrated, Vocabulary-specific User Interfaces

Compound documents imply compound user interfaces. For example, a document section written in WSBPEL, which “enables users to describe business process activities as Web services and define how they can be connected to accomplish specific tasks,” may best be rendered as a flowchart. UBL, “defining a common XML library of business documents (purchase orders, invoices, etc.),” may look better in a business form editor. Editing a document containing both vocabularies requires switching naturally and seamlessly between preferred user interfaces in each section of the hybrid document.

Since the hybrid document encapsulates nodes of both vocabularies, an editor for the compound document should be able to access values and attributes across vocabulary boundaries. XPATH provides a mechanism for transparent access to the entire document and allows applications to detect and resolve data conflicts.

2.5 Distribute XML Processing Instructions throughout the Enterprise, along with Data

The difficulty of coordinating change across a large number of geographically separated business entities greatly hampers the propagation of beneficial new technology through large enterprises.

We know that XML has replaced SGML in large part because of the infrastructure that now exists to distribute XML in a standardized way over the Internet. This presents an unprecedented opportunity to accelerate network computing as never before.

The distribution mechanism is general enough to enable efficient exchange of not only the information itself, but also the meta-data: algorithms and instructions for processing the XML, such as XSLT.

Applying XML technology only to the data severely limits the ability of enterprise systems to adapt in Internet-time to dynamic business process variation. It is like designing procedural systems in a world that has moved on to object-oriented architectures. To maximize the power of XML, do as the software engineers do: encapsulate data and operations in your business objects.

3. Implementation of xfy

We have shown that the native flexibility and extensibility of XML is of great use to contemporary enterprises. Since xfy was designed specifically to capitalize on these features of XML, its application to enterprise systems should confer similar benefits.

Here we describe some representative applications of the technology and comment on their utility for the enterprise.

3.1 Unrestricted Composition of Vocabularies in a Unified DOM

The requirements imposed by namespace handling (see Section 2.3 ) and data reference sharing (see Section 2.4 ) imply that compound documents should be represented by a single DOM tree. The resulting DOM is a hybrid, with sub-trees corresponding to each separate namespace. XPATH provides a mechanism for transparent access to any node from any other, so that every node in this DOM can easily interact with every other (see Figure 1).

Figure 1

Interactive integration of the hybrid XML data enables synchronization of related data items in the tree. For example, an element in MyXML may represent a summary of UBL Order elements; adding a new Order in the UBL sub-tree should cause an update to that summary field in the MyXML sub-tree.

As a different user interface may be associated with each component vocabulary, the composite user interface for the DOM tree must incorporate and integrate all of them.

3.2 Real-time Round-trip Engineering with the Hybrid DOM

Xfy supports unlimited extension through two complementary techniques: an XML-based method that employs XSLT-like scripts and a more powerful, pure Java implementation.

The Vocabulary Connection Descriptor (VCD) is a lightweight XSLT extension that adds the two-way “reflection” features necessary for roundtrip engineering and supports easy integration of new vocabularies by mapping new elements onto existing implementations. Such script development should be well within the capabilities of moderately-skilled end users, such as enterprise knowledge workers. Skilled developers can take advantage of the “thin core” API to code plug-ins directly in Java. These plug-ins are limited only by the imagination of the programmer; they can do anything Java can do.

XSLT is the only mechanism in common use to handle pure XML on the client side. It is useful as far as it goes – unidirectional conversion of source XML into a suitable target XML for visualization – but there is no provision for defining a bidirectional user interface. XSLT generates a visualization by defining templates that represent mapping rules for transforming XML data into a display vocabulary like XHTML or SVG.

Figure 2 shows an XSLT snippet that calculates the absolute positions of nodes in an XML source and uses a fixed offset to place them in the target XML. It is possible in this manner to establish an initial relationship between the source and target XML, but the mapping operation is only performed once. Roundtrip engineering – iterative, bidirectional synchronization of changes in either the underlying data or its visualization – would require an XSLT that could rerun the “value-of” code in Figure 2 every time the source changed to recalculate the offsets and update the visualization.

Figure 2

XSLT batch scripts have three major limitations that preclude true roundtrip engineering.

xfy’s VCD extends XSLT in just these ways to support interactive editing of the visualization. Figure 3 shows the VCD equivalent of the XSLT in Figure 2, including new code for the definition of user interface components.

This VCD defines a command insert, specifies its user interface (a context menu), and includes an implementation for adding a new node to the source XML (<NewName/>). xfy’s integrated use of namespaces (see Section 2.3 ) ensures that the insert operation user interface is available in any portion of the hybrid DOM tree written in MyVocabulary. Since xfy’s hybrid DOM tree uses XPATH for maintenance of internal references and the target position of insert operations is expressed in XPATH, a single xfy operation can update and synchronize all relevant target nodes in the visualization. The mapping is bidirectional and dynamic. User-defined commands support adding (or changing or deleting) nodes in the XML source; the xfy framework then runs the “batch” rendering processes again to update the visualization.

VCD builds upon the XSLT heritage of vocabulary mapping, moving from static to dynamic and from batch to interactive, using a syntax and grammar familiar to XML users.

Figure 3

3.3 Dynamic Distribution of Both Data and Visualization over the Internet

One of the most compelling features of XML for the enterprise is the ease of distributing XML documents over the web. But why stop there? A VCD is itself XML, and can be distributed just as easily. In fact, xfy-enabled XML documents can specify a VCD – and gain all the functionality it provides – by simple reference in a hyperlink, as shown in Figure 4.

Figure 4

Upon encountering an XML document with embedded VCD links, the xfy framework can load vocabulary plug-ins from all over the world, and then evaluate any arbitrarily complex compound document containing any combination of the referenced vocabularies.

VCDs can be loaded from the Internet, Intranet, or local file system (enabling local customization). Distributing VCD updates across a worldwide enterprise is as easy as modifying the central repository referenced by the documents.

Document authors can distribute both data and associated functionality, and users can confidently accept new files with arbitrary format changes without fear of incompatibility.

Naturally, dynamic distribution of “active documents” on such a large scale raises security concerns that must be addressed in practical systems. Since both the data and the user-interface are written in XML, existing XML security standards are a good choice for implementation (e.g., XML-Signature). The security can be also implemented on the xfy platform. xfy’s pure Java implementation is another factor that enhances the security of the platform.

3.4 Support Unlimited Nesting of XML Vocabularies in an Integrated, Composite UI

xfy associates user interfaces with XML namespaces; at any level of the hybrid DOM tree, xfy presents the user with the interface corresponding to the vocabulary in force in that branch. This maximally flexible approach allows arbitrarily complex nesting of XML vocabularies.

Figure 5 shows how xfy handles a complex compound document containing XHTML, SVG, and MathML. These vocabularies have existing Java user interfaces that were developed without regard to xfy or any particular tree structure. Nevertheless, xfy can import and use these interfaces where appropriate within a compound document, right alongside xfy-specific interfaces implemented through VCDs.

Figure 5

4. Application to the Enterprise

xfy innovations bring the full power of XML to bear on the problems associated with the distribution of data and business logic over large scale enterprises. We have made the case that xfy accelerates enterprise activities, principally by easing the adoption and propagation of new technology throughout the enterprise and by preventing the loss of mission critical data through incompatibility or obsolescence. Here we show how xfy’s enhanced capacity for data abstraction and extensibility saves the day in a real world application where the data is complex and the requirements change over time.

4.1 A Unified, User-Friendly, Universal Interface for XML

The data may be complex, but the user interface should not be! It is instructive to think of xfy as an XML “word processor.“ Alternatively, those with a more technical bent may recognize it as a RAD (rapid application development) tool for XML documents.

As in a word processor, each “paragraph” in the document – each sub-tree written in a particular XML vocabulary – can have its own formatting and presentation rules separate from the others. A VCD is a bit like a “paragraph style” definition; multiple paragraphs scattered throughout a long document can be “tagged” with a style – written in a certain vocabulary – and the formatting rules defined by the style apply equally to all of them.

Complexity in document composition does not create complexity in the user interface, since the “plug and play” architecture supports dynamic, context-sensitive selection of the user interface at each point in the document. The “cursor” can only be in one place at a time; if there are three vocabularies in the document, the user interface is always in one of three states, regardless of the degree of vocabulary nesting in the compound document. This flexibility in swapping out VCD interface definitions also makes possible the optimization, reuse, and re-purposing of XML documents by end users to a degree never seen before.

Since xfy views the entire compound document as a single DOM tree – and XML is just text, after all – the platform also supports document-wide operations across vocabularies (e.g., undo, global search & replace).

4.2 Seamless Vocabulary Connection

XML standards often incorporate other standards by reference; XML data in the real world commonly contains elements of multiple vocabularies. For example, the WSBPEL standard (“enabling users to describe business process activities as Web services and define how they can be connected to accomplish specific tasks”) imports WSDL and SOAP. The SOA standard (“an architectural style whose goal is to achieve loose coupling among interacting software agents”) imports WSDL, SOAP, and UDDI.

To the extent that these simple compound documents involve only the fixed structure specified by the standard, it is not difficult for a specific application to support them, given a reference implementation of the user interface for each vocabulary. However, any data that does not conform must be handled outside the application or integrated manually.

Unfortunately, non-conforming data is the rule rather than the exception. Large enterprises must handle multiple standards simultaneously: external constraints, such as federal (government) regulations; cooperative constraints, such as the need to accommodate a business partner’s participation in RosettaNet; and purely local constraints, such as proprietary XML vocabularies required by legacy systems.

For example, a supplier may require UBL for buyer Company A and RosettaNet’s RNIF for buyer Company B. Alternatively, the same data may need to be presented to downstream consumers in a variety of presentation formats. Complexity of this type calls for data processing capabilities beyond the scope of the individual standards.

Figure 6 shows two ways to deal with this complexity. The simplest method involves defining a user interface for each vocabulary (or using an existing one), and letting the plug-in architecture load the appropriate interface at each point in the compound document (6a).

Figure 6

Alternatively, a new VCD could map each vocabulary onto a comprehensive, custom vocabulary (MyXML), and define operations on it. The roundtrip engineering aspect of xfy guarantees that any changes made to the composite document will be reflected in the original sources.

Since the VCD mechanism is an XML technology, any data that is not XML would need to be extracted (e.g., using MIME) and processed through a Java plug-in.

4.3 Real-time Extensibility

Across many enterprises, introducing new business logic is easier than managing the timely propagation of changes to existing rules and regulations, since updates occur asynchronously and often.

To stay competitive, modern enterprises need to react at Internet speed. XML-based web services, as embodied in the Enterprise Service Bus (ESB) architecture, is one piece of current technology designed to meet that challenge – but mostly for servers. Most clients cannot keep up with rapidly changing processes; this lack of client-side adaptability could be a fatal blow to a new enterprise system seeking widespread adoption.

Capitalizing on the extensibility inherited from XSLT, xfy’s VCD mechanism improves the outlook dramatically. Designers of new XML data need not rely on slow and uncertain propagation of appropriate client interfaces. Instead, the new XML can be distributed along with its user interface – the VCD – at Internet speed, allowing clients to incorporate the new vocabulary into their own documents immediately.

If the interface is somehow insufficient for the client’s needs, the default VCD interface provided can be customized or repurposed as necessary.

If an existing user interface is close enough (i.e., required node names remain constant), it is even possible for an end user to use the node transparency and XPATH compatibility of xfy to edit the new data using the old user interface.

Figure 7 shows two methods of achieving extensibility. Since a VCD is itself XML, changes are immediate and transparent. Java plug-ins require a bit more integration and initialization at the present stage of xfy technology development.

Figure 7

4.4 XML Validation

At present, the most popular user interface function for XML is validation using XML Schema. This validation involves both the overall structure of the tree and the data type at every node.

A simple way to validate the data type of a text node is to define the type explicitly as an attribute of an editable text node. This type of validation is executed in real-time.

VCDs extend the rules in three ways:

A VCD can define a user interface according to an XML Schema, yet is not limited to the nodes specified by the schema. Global validation is possible in xfy since the entire document is contained in a unified DOM, with node transparency courtesy of XPATH. In the simple case where a user wishes to define an XML Form from an XML Schema, it is possible to generate the VCD automatically from the form editor.

Validation of non-XML data, such as cross-checking the header definition and attachments in SOAPwithAttachment, can be accomplished only with the full flexibility of a Java plug-in. This type of validation is useful for business standards like ebXML and RNIF. Unfortunately, non-XML data cannot be integrated into the unified DOM, and must be handled externally by the application under user direction.

4.5 Sufficiency of the VCD Mechanism

In many cases, XML vocabulary mapping and transformation can be described easily and economically through the VCD mechanism. As collections of pattern matching rules familiar to XSLT users, VCDs are much simpler and easier than Java coding. However, this very simplicity precludes their use in cases where heavy numeric computation or non-XML data is involved.

This problem is not specific to xfy or VCD technology; it can be solved in much the same way other platforms evolved to meet the needs of their users. The STL was added to C++; JavaBeans, JMS, and other standardized functions were added to Java. User communities will reliably create libraries of functions necessary in their domains. AES, XML-Cipher, XML-Signature, and similar standards are already filling this gap. The extensibility of xfy only makes this process easier.

5. Conclusion

xfy is a generalized client interface for end users in the world of XML. It applies to both document- and data-oriented schemata; in fact, it works equally well with any well-formed XML.

Like the XML standard upon which it is based, xfy is extensible and adaptable. In particular, it opens the formerly server-based world of enterprise-wide XML to clients and encourages the rapid adoption and propagation of new technology throughout the enterprise – all while making data obsolescence obsolete.

In this paper, we showed the vast and growing potential of generalized XML handling. xfy is a platform for solution development, not a comprehensive solution in itself. We anticipate a growth industry in new vocabularies and VCDs to enhance the utility and usability of xfy. xfy’s adaptive client framework provides enterprise knowledge workers with a new found freedom to take control of their data and give full expression to their creativity.

Bibliography

6. References
[xfy] xfy-technology - authoring and editing compound XML documents
Justsystem Corporation, 2004
[XML Schema validation] XML Schema Part 0: Premier Second Edition W3C Recommendation
W3C, 28 October 2004
[SOAP] SOAP Version 1.2 Part 1: Messaging Framework W3C Recommendations
W3C, 24 June 2003
[XML-Signature] XML-Signature Syntax and Processing W3C Recommendation
W3C, 12 February 2002
[XML-Cipher] XML Encryption Syntax and Processing W3C Recommendation
W3C, 10 December 2002
[SVG] Scalable Vector Graphics (SVG) 1.1 Specification W3C Recommendation
W3C, 14 January 2003
[MathML] Mathematical Markup Language (MathML) Version 2.0 (Second Edition) W3C Recommendation
W3C, 21 October 2003
[WSBPEL] Web Services Business Process Execution Language Version 2.0 Working Draft
OASIS, 01 December 2004
[UBL] Universal Business Language 1.0
OASIS, 2004
[XPATH] XML Path Language (XPath) Version 1.0 W3C Recommendation
W3C, 16 November 1999
[SGML] ISO 8879. Information Processing - Text and Office Systems - Standard Generalized Markup Language
ISO, 1986
[DOM] Document Object Model (DOM) Level 2 Core Specification Version 1.0 W3C Recommendation
W3C, 13 November 2000
[WSDL] Web Services Description Language (WSDL) 1.1 W3C Note
W3C, 15 March 2002
[UDDI] UDDI Version 2.04 API Specification UDDI Committee specification
OASIS, 19 July 2002
[RNIF] RosettaNet Implementation Framework : Core Specification Version : V02.00.01
6 March 2002
[MIME] The MIME Multipart/Related Content type
IETF, 1998
[SOAPwithAttachment] SOAP 1.2 Attachment Feature W3C Working Group Note
W3C, 8 June 2004