XTech 2005: XML, the Web and beyond.

Model Driven Compound Document Development

Discuss this paper on the XTech wiki
View XML source for this paper

Keywords

Abstract

Model Driven Compound Document Development

Building tooling for the creation of mixed namespace documents can be daunting, and the result is often inflexible with respect to keeping up with changes in schemas and changes in supporting browser and renderer user agents. One solution is an open standards-based approach leveraging the Eclipse Modeling Framework (EMF) and underlying ECore models for representing functional schemas (such as XHTML, XForms, VoiceXML, MathML, SVG, SMIL, etc.) and the connections between these functional schemas. These models can be leveraged to provide a dynamic environment for automated serialization of instance documents that adhere to the combined functional schema definitions while providing a directed editing experience.

Compound Documents

The World Wide Web Consortium (W3C) has started a Compound Document Formats (CDF) Working Group that grew out of a Web Applications and Compound Documents Workshop to explore issues around standardization for compound documents and specification of the behavior of some format combinations, addressing the needs for an extensible and interoperable Web.

The CDF Working Group is focused on the combinations of specific namespace vocabularies which will become CDF profiles, such as a rich media profile for mobile devices, which might include Extensible Hyper Text Markup Language (XHTML) and Scalable Vector Graphics (SVG) Tiny. Other examples include combinations like XHTML and XForms, or XHTML and a subset of VoiceXML using the X+V profile.

Compound Documents Defined

A namespace uniquely identifies a set of names so that there is no ambiguity when objects having different origins but the same names are mixed together. An XML namespace is a collection of element type and attribute names. These element types and attribute names are uniquely identified by the name of the unique XML namespace of which they are a part. In an XML document, any element type or attribute name can thus have a two-part name consisting of the namespace name and the element or attribute name.

A compound document by inclusion combines XML markup from several namespaces into a single physical document. A number of standards exist, and continue to be developed, that are descriptions of XML markup within a single namespace. XHTML, XForms, VoiceXML, and MathML are some of the prominent examples of such standards, each having its own namespace. Each of these specifications focuses on one aspect of rich-content development. For example, XForms focuses on data collection and submission, VoiceXML on speech, and MathML on the display of mathematical notations.

To authors of content, each of these many standards is useful and important. However, it is the combination of elements of any number of these standards that lends true flexibility and power to rich document creation. A document may exist to be displayed within a web browser, to display an input form, with a scalable graphic and a bit of mathematical notation, all on the same page. XHTML, XForms, SVG, and MathML, respectively, serve these needs, and could therefore be combined into a single multi-namespace document.

Consider this simple example, a compound document combining XHTML and MathML. The namespace declarations are marked by an appended comment to match the numbered namespaces listed below in the XML source in Example 1.

1. XHTML Namespace declaration. The namespace for XHTML 1.0 is declared. Each XHTML element in the example below is qualified with the xhtml: namespace prefix.

2. MathML Namespace declaration. The namespace for MathML 2.0 is declared. Each MathML element in the example below is qualified with the mathml: prefix.

 

Example 1: A Simple Compound Document
 
<?xml version="1.0" encoding="iso-8859-1"?>
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"><!-- 1 -->
  <xhtml:body>
   
 <xhtml:h1>A Compound document</xhtml:h1>
   
 <xhtml:p>A simple formula using MathML in XHTML.</xhtml:p>
   
 <mathml:math xmlns:mathml="http://www.w3.org/1998/Math/MathML"><!-- 2 -->
    
  <mathml:mrow>
      
  <mathml:msqrt>
        
  <mathml:mn>49</mathml:mn>
      
  </mathml:msqrt>
      
  <mathml:mo>=</mathml:mo>
      
  <mathml:mn>7</mathml:mn>
    
  </mathml:mrow>
  
  </mathml:math>
  </xhtml:body>
</xhtml:html>
 
Figure 1: Rendered Simple Compound Document

Compound documents may be composed of a single document that contains multiple namespaces, as seen in Example 1. This is a Compound Document “by Inclusion” (CDI). However, a compound document may also be composed over several documents in which one document of a particular namespace references another separate document of a different namespace.

For example, a root or top-most document might contain XHTML content for defining and formatting a page. This parent XHTML document can reference another document, , of another namespace, through the use of the XHTML <object> tag. This can be repeated for as many documents as needed. The root document plus this collection of separate, referenced documents is considered a Compound Document “by Reference” (CDR). See Figure 2 for a simple CDR document in which an XHTML root document contains a reference to a separate SVG child document having markup for three colored circles.

Figure 2: Compound Document by Reference

 

And of course, a compound document may be a hybrid of both compound document by inclusion and compound document by reference.

 

Model Driven Development

Model Driven Development (MDD) is an approach, and set of techniques, for developing better software faster. The Object Management Group (OMG) has labeled this notion of MDD as Model Driven Architecture (MDA) and has a set of standards to assist in MDD. The process begins by defining business logic early in the requirements phases of software development. This business logic would be modeled, perhaps in the Unified Modeling Language (UML), based upon the abstraction of the business logic. The resulting model(s) would then be the basis for generating code to produce an implementation.

Reasons for MDD:

Models can be represented in many forms such as Unified Modeling Language, XML Model Interchange, Essential Meta Object Facility, and XML Schema.

Model Driven Development in Eclipse

Eclipse is an open source tool integration platform, most often used as a Java development environment. As a tool integration platform, Eclipse has a varied and ever growing set of editors and utilities, one of which is the Eclipse Modeling Framework (EMF).

EMF is a tools sub-project for the Eclipse Open Source Project. EMF is a modeling and data integration framework, and also EMF is a code generation framework for building plug-ins for Eclipse. EMF uses ECore, a meta-language used to describe models and to provide runtime support for the models. ECore is a standards-based meta language-based upon a subset of the OMG Meta Object Facility 2.0 (MOF) called Essential MOF (EMOF). EMF models are persisted as XML Model Interchange (XMI) documents. EMF provides viewing and command-based editing of the model as well as a basic editor for manipulating and serializing instance documents based on an EMF model. EMF models can be created from annotated Java, XML documents, or UML models.

EMF serves as the backbone for MDD in Eclipse.

Compound Document Tooling

Compound documents by reference can be created and edited by existing XML editors today, since the references to other documents use generic reference mechanisms such as the <xhtml:object> tag, or XLink. However, editors for compound documents by inclusion require knowledge of more than just how to validate instances of separate documents that reference in order to offer a directed editing experience. An editor that supports compound documents must have specific information about which tags from one namespace can be inserted as children of tags from another namespace. These cross-namespace relationships can be bidirectional and recursive as well. This definition of which tags can be inserted under other tags for a set of mixed namespaces is called a compound document profile. Several explicit compound document profiles exist today such as XHTML and a subset of VoiceXML (X+V ), and XHTML and MathML and SVG.

To provide a concrete example, consider an XHTML+XForms compound document profile that must define which XForms tags can exist as child tags for specific XHTML tags and vice versa. One requirement for this XHTML+XForms profile is that an <xhtml:div> element can have as a child an <xforms:repeat> element which can have as a child another <xhtml:div> element, which can in turn have as a child an <xforms:input> element, as shown in Example 3.

Example 3: XHTML and XForms nested tags
 
<xhtml:div>
   
  <xforms:repeat model=”model_PostalAddress”
          
  id=”repeat_AddressLine_model_PostallAddress”
          
  nodeset=”/hrxml:PostalAddress/hrxml”DeliveryAddress/hrxml:AddressLine”>
          
  <xhtml:div>
                 
  <xforms:input ref=”.” model=”model_PostalAddress”>
                        
  <xforms:label>Address Line</xforms:label>
                 
  </xforms:input>
   
       
  </xhtml:div>
   
  </xforms:repeat>
</xhtml:div>
  
  

This nesting of tags needs to be explicitly defined with mechanisms beyond xsd:any and xsd:anyAtrributes because validating and directed editors, and user agent implementers who write rendering code for browsers need more explicit detail to unambiguously validate and guide document construction, and to build the processing and rendering engines.

 

Compound Document Tooling Users

When considering compound document creation and editing tooling there are two users to accommodate: the compound document schema architect and the instance document creator.

The compound document schema architect wants to efficiently express the definition for how specific namespace vocabularies can be combined using defined profiles. This is the person who builds the implementation of a compound document profile.

The instance document creator wants to leverage the profile, but has no interest in building or editing the profiles. The instance document creator simply wants to create well-formed and valid instances of documents that adhere to a profile, preferably with a directed editor and correct-by-construction experience. This experience is one in which restricted choices are offered to the editor for valid context-sensitive choices according to the profile.

EMF as an open modeling technology is a natural fit for defining compound document profiles. The EMF ECore models can then be used for creating Eclipse-based editors for document creation and serialization.

The model driven approach to compound document tooling begins with Platform Independent Models (PIMs) of each functional namespace (XHTML, XForms, SVG, etc.) that will be in a profile. A PIM model is a high level abstraction that does not consider implementation realizations, but rather expresses only the intent of what is being modeled. The PIMs can take many forms such as XML Schema, RelaxNG, Schematron, MOF, or UML models. Once the PIM models for all the profile schemas are created, the PIM models can be transformed to Platform Specific Models (PSMs) all of the same normative type. For example all the PSMs might be XML Schema, or UML models, or EMF ECore models. Next, the profile will be realized by creating cross-model references between the models, representing the places where tags from one namespace may be referenced, or inserted under, another. For example, a profile for XHTML+XForms would need to define that a <xforms:model> tag can be inserted under the <xhtml:head> tag. Figure 3 shows this PSM XHTML+XForms profile annotation as a UML aggregation relationship between the “head” class from XHTML PIM model and the “model” class from the XForms PIM model.

 

Figure 3: PSM Cross-Model Relationship in UML

 

The PSMs can be transformed into EMF ECore models which can be created from UML models or XML Schema, using EMF provided tooling. In the example in Figure 3, the aggregation relationship would become an EReference in the PSM ECore model. Creating these models and realizing the profiles as references across these models is the role of the compound document schema architect. These PSM models that realize the compound document profile are then used to drive a directed editor that is used by the instance document creator to create and edit instances adhering to the profile (see Figure 4).

 

Figure 4: Model Driven Compound Document Editor Profile Creation

 

The use of a model driven approach is an efficient way to create functional PIM models of specific namespaces that can be used to create PSM models of combinations of namespaces to represent profiles. The PIM models can be reused many times in different combinations to form as many profiles as required. The use of Eclipse EMF ECore models is an ideal way to get directed editing and serialization for the creation of an instance document in a Compound XML Document Editor.

Compound XML Document Editor

The Compound XML Document Editor is a dynamic editor framework which uses ECore models to drive model-based compound document construction. Any type whose instances are serialized to XML can be added to the Compound XML Document Editor framework without the need to write any Java code. The Compound XML Document Editor uses model repositories, in which ECore models are stored. Once an ECore model is dropped into a Compound XML Document Editor model repository and the Compound XML Document Editor is started, instance documents can be created or edited dynamically from these ECore models. Model repositories can be created to accommodate as many models and compound document profiles as required.

Individual models may be swapped out or entire model repositories may be switched at runtime. Furthermore, changes can be made to ECore models on the fly and immediately reflected in editor, and serialized instance documents.

The Compound XML Document Editor comes with ECore models for XHTML, XForms, XML Events, SVG, SMIL, VoiceXML, XUL, MathML, and XLink. Figure 5 shows the available profile combinations in the default model repository with XHTML as the root document, with a profile which allows inclusion of elements and attributes from several other namespaces.

Figure 5: Default Model Repository

The Compound XML Document Editor uses the underlying EMF models to provide a directed editing experience through restricting the allowable right-click options for tag insertion, see Figure 6. Element attributes are represented as properties in a property sheet.

Figure 6: Directed Editing

Once a document has been created, it may be rendered directly from configurable right-click menu options for browsers that support the compound document profile used in the document. See Figures 7 and 8.

Figure 7: Rendering Options
Figure 8: X-Smiles rendered XForm

Conclusion

The Compound XML Document Editor is a standards-based, model-driven, compound document development framework and supports dynamic compound document creation and serialization. The Compound XML Document Editor utilizes Model Driven Development concepts with Eclipse EMF to help develop flexible compound documents and the profiles that define them.

Acknowledgements

Thanks to Simon Johnston and Steve Speicher.

Biography

Kevin Kelly

IBM Corporation

Kevin E. Kelly is a Senior Software Engineer with the IBM Corporation working on Software Standards. Kevin is a member of the W3C XForms Working Group as well as the W3C Compound Document Format Working Group. His focus is on the client technology and evolving open standards-based technologies for faster, more efficient standards adoption through XML-based and model-driven approaches. Before joining IBM, Kevin spent 8 year at Rational Software working on UML modeling and Java technologies. Kevin holds a B.S. from Mercer University, and a M.S. from the University of Montana.

Biography

Jan J. Kratky

IBM Corporation

Jan Joseph Kratky is the lead developer for the Compound XML Document Editor and XML Forms Generator. Currently a software engineer with IBM Emerging Software Standards in Research Triangle Park, North Carolina, he holds a B.A. from Cornell University and an M.S. from Rensselaer Polytechnic Institute. A Sun Certified Java Programmer and Sun Certified Web Component Developer, Jan has worked with Java technologies since 1997, and with Eclipse technologies since 2001.

Biography

Keith Wells

IBM Corporation

Keith Wells is a software developer at the IBM RTP campus. Keith has been involved with Emerging Technologies and the Emerging Technologies Toolkit for several years.