XTech 2005: XML, the Web and beyond.
The XML format of OpenOffice and its export filters (which can be any XSLT transformations) enable unexpected applications such as... using OpenOffice to produce XML schemas!
The main (and only common) feature of the many XML Schema languages is validation. XML Schema languages are about validation and when they provide features that lets you describe XML structures, these features describe validation rules more than anything else.
Because of this target, XML schema languages are low level, difficult to learn and to read for a non specialist and they are poor modeling tools compared to first class modeling tools such as UML.
Higher level modeling tools are needed, especially if the design is done by people who are not XML schema specialists.
Relying on the user friendliness of an XML schema IDE isn't an option: the most user friendly tool will leave to the user technical decisions that should taken by specialists.
UML is the first option that comes to mind when we talk of modeling tools. Class diagrams are quite handy to describe XML documents. Their big benefit is that they are universally understood.
However, they require stereotypes and/or convention to specify XML specificities such as the distinction between elements and attributes, defining when order matters, ...
Also, the graphical nature of class diagrams makes them difficult to read when you have a high number of elements to represent.
Examples of documents are often easier to read than XML schemas. For that reason, XML snippets are often added to the documentations to illustrate the schemas.
Proposals have been done to use annotated XML documents as higher level views of XML schemas (see for instance http://examplotron.org), but this approach is very marginally used.
Spreadsheets have become one of the most popular tools in the toolbox of many computer users and that's no surprise that many people have had the idea to store model information in theit favorite tools.
Spreadsheet are less graphical than UML diagrams and a simple model with few elements will be less readable in a spreadsheet than in an UML diagram. On the other hand, when the number of elements is increased, the ability of spreadsheets to handle large tables becomes predominant and the spreadsheet easier to read than the UML diagram.
Another argument in favor of the spreadsheets is that UML tools are expensive and have limited interoperability while spreadsheets are either open source or likely to be already installed on most of the workstation. Furthermore, the different spreadsheet processors are quite interoperable.
My requirements for choosing a spreadsheet processor have been:
Because I am a Linux user, I needed a tool that's working on Linux. Because there is a risk that other people may not all be Linux users, I wanted a tool that runs also on Mac OS-X and Windows...
Because I wanted to use XSLT to transform my spreadsheets into W3C XML Schemas, I needed a tool that, at the very minimum, was able to produce XML documents.
The first reason alone was enough to impose OpenOffice. The XML features of OpenOffice have been the cherries on the cake... We'll see that later on.
A prospect called me to ask what was the best approach to model a XML vocabulary. After discussion, it appeared that this vocabulary was an exchange format involving three different teams working from three different locations. Each team has its own approach to describe the format and they needed external expertise to choose the best approach.
One of the team was using an UML modeling tool and generating W3C XML Schema schemas from the UML class diagrams. Since UML was not a strategic choice for the customer, this team was the only one to have licenses for this UML tool and this option wasn't considered generalizable.
Another team was using an XML IDE to edit WXS schemas. The different teams had not followed WXS trainings and this option was considered too complex.
The last team was producing plain text documentations including pseudo XML snippets (XML pieces with non well formed annotations to specify types and cardinalities. This option was considered anecdotal.
Given the context, I proposed three (other) different approaches and documented their advantages and drawbacks.
The first option was an adaptation of their XML snippets, keeping as much as possible of the conventions they had invented but making them well formed XML. I provided sample XSLT transformations showing how they could be transformed into WXS schemas.
The second option was to use OpenOffice spreadsheets as a modeling tool.
The third option was to use the RELAX NG compact syntax which is much easier to learn and to read than W3C XML Schema and to convert that syntax into W3C XML Schema schemas using trang.
After looking at these options, they decided to use OpenOffice spreadsheets which was the option that they were the most confident with.
A model (schema) is defined in a spreadsheet with five columns.
The first column is the element or attribute name.
The second column is used for the definition of the content of the elements when they have a complex type.
The third column is the cardinality using a min-max convention.
The fourth column is the datatype.
The fifth column is used for documentation.
A global simple type element or attribute requires just one line :
Simple type element or attributeElement or attribute
Content
Card
Type
Documentation
@id
xs:ID
id attribute
name
xs:token
name (definition)
Element or attribute
Content
Card
Type
Documentation
author
name
name (reference)
surname
xs:token
Surname (local)
born
birth date (reference)
died
0-1
xs:date
death date (optional) (local)
Element or attribute
Content
Card
Type
Documentation
title
xs:token
book's title
@lang
xs:language
language (no support for namespaces yet)
For simplification purposes, we have adopted the followwing conventions and restrictions in the current version (most if not all of them could be relatively easily fixed):
The current version does not support namespaces.
The current version has no support for defining global types (either simple or complex).
From the conventions we've used in the layout of the table, complex type elements are always defined globally.
Global attributes and simple type elements use the first column for their names. Local ones are defined as content of an complex type element and their names appear in the second column.
Element or attribute
Content
Card
Type
Documentation
# Description of a library (example)
library
Library element (root)
book
book element (reference)
book
book element (description)
@id
id (reference)
isbn
char(10)
ISBN number
title
title
author
0-n
author(s)
character
0-n
character(s)
title
xs:token
book's title
@lang
xs:language
language (no support for namespaces yet)
author
name
name (reference)
surname
xs:token
Surname (local)
born
birth date (reference)
died
0-1
xs:date
death date (optional) (local)
character
name
name (reference)
born
0-1
birth date
qualification
xs:token
qualification
# Common attributes
@id
xs:ID
id attribute
# Common elements
name
xs:token
name (definition)
born
xs:date
birth date (definition)
The simplest OpenOffice export filter s are basically XSLT transformations designed to transform OpenOffice's native XML format into any other XML format.
In our case, I have written an XSLT transformation “schema.xsl” that transforms OpenOffice into W3C XML Schema.
The schema generated from this example by schema.xsl” is (note the difference of verbosity):
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:annotation>
<xs:documentation> Description of a library (example)</xs:documentation>
</xs:annotation>
<xs:element name="library">
<xs:annotation>
<xs:documentation>Library element (root)</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element ref="book">
<xs:annotation>
<xs:documentation>book element (reference)</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="book">
<xs:annotation>
<xs:documentation>book element (description)</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element name="isbn">
<xs:annotation>
<xs:documentation>ISBN number</xs:documentation>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:maxLength value="10"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element ref="title">
<xs:annotation>
<xs:documentation>title</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element ref="author" minOccurs="0" maxOccurs="unbounded">
<xs:annotation>
<xs:documentation>author(s)</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element ref="character" minOccurs="0" maxOccurs="unbounded">
<xs:annotation>
<xs:documentation>character(s)</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
<xs:attribute use="required" ref="id">
<xs:annotation>
<xs:documentation>id (reference)</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:complexType>
</xs:element>
<xs:simpleType name="title">
<xs:annotation>
<xs:documentation>book's title</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:token"/>
</xs:simpleType>
<xs:element name="title">
<xs:annotation>
<xs:documentation>book's title</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:simpleContent>
<xs:extension base="title">
<xs:attribute use="required" name="lang" type="xs:language">
<xs:annotation>
<xs:documentation>language (no support for namespaces
yet)</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="author">
<xs:complexType>
<xs:sequence>
<xs:element ref="name">
<xs:annotation>
<xs:documentation>name (reference)</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="surname" type="xs:token">
<xs:annotation>
<xs:documentation>surname</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element ref="born">
<xs:annotation>
<xs:documentation>birth date</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="died" minOccurs="0" maxOccurs="1" type="xs:date">
<xs:annotation>
<xs:documentation>death date (optional)</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="character">
<xs:complexType>
<xs:sequence>
<xs:element ref="name">
<xs:annotation>
<xs:documentation>name (reference)</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element ref="born" minOccurs="0" maxOccurs="1">
<xs:annotation>
<xs:documentation>birth date</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="qualification" type="xs:token">
<xs:annotation>
<xs:documentation>qualification</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:annotation>
<xs:documentation> Common attributes</xs:documentation>
</xs:annotation>
<xs:attribute name="id" type="xs:ID">
<xs:annotation>
<xs:documentation>id attribute</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:annotation>
<xs:documentation> Common elements</xs:documentation>
</xs:annotation>
<xs:element name="name" type="xs:token">
<xs:annotation>
<xs:documentation>name (definition)</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="born" type="xs:date">
<xs:annotation>
<xs:documentation>birth date (definition)</xs:documentation>
</xs:annotation>
</xs:element>
</xs:schema>
In order to facilitate their installation, export filters can be packed into jar files. Their installation becomes then a matter of a couple of mouse clicks.
As soon as the export filter has been installed, a model can be exported as W3C XML Schema straight away from the OpenOffice application.
This is cool, but we could do much more, including (to name few):
That would be tough in the general case, but pretty trivial for schemas that are following the style used in our exports.
That should be rather easier than importing and exporting WXS schemas given the fact that RELAX NG is easier to read and write than WXS.
We could generate the textual documentation from a schema as an OpenOffice Writer document. This document could then be exported in the many formats supported by OpenOffice.
This one would probably be the killer app!
Eric van der Vlist
CEO, Dyomedea