XTech 2005: XML, the Web and beyond.
CSS and XSL are two technology standards from the W3C that can be used to print XML documents.
The two technologies use entirely different processing models and syntax to achieve their goals. However, they have one important similarity: both of them have a page layout model based on the automatic pagination of a continuous flow of text.
This makes them suitable for printing books, contracts, letters and other documents that include arbitrary amounts of flowing text that must be divided over multiple pages. They are less suitable for documents based on a fixed layout, such as magazines and newspapers, in which the layout of each page requires manual intervention.
Since CSS and XSL are both viable contenders for many printing tasks, people who wish to print XML documents are faced with a choice as to which one they should use.
In this paper I argue that CSS is the best and most cost-effective solution for styling and printing XML documents. XSL on the other hand is over-complex and ill-suited for styling XML documents. It should be used sparingly, if at all, for performing document transformations only, in conjunction with CSS styling.
CSS was born as a styling language for HTML documents on the web and grew to become a styling language for XML in general.
A CSS style sheet contains rules which apply style properties to the elements in an XML document. The style properties applied to each element determine the layout, colors and fonts used when printing that element.
The syntax and processing model of CSS are easy to explain with the help of a few simple examples. Consider trying to print the following XHTML document:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>A simple XHTML document</title>
</head>
<body>
<p>
This is a <em>very</em> simple document, and that
is a <strong>good thing</strong>, don't you agree?
</p>
</body>
</html>
XHTML uses semantic markup rather than presentational markup, so we will need to provide styling information indicating how we wish each element in the document to be formatted.
The <em> and <strong> elements are
used in XHTML to indicate emphasis and strong emphasis, respectively.
The following two CSS rules declare that text within an
<em> element should be printed with an italic font and
text within a <strong> element should be printed with a
bold font:
em { font-style: italic }
strong { font-weight: bold }
The element name at the start of the rule is a selector used to select the elements to which the style properties will be applied. For convenience, it is possible to use multiple selectors in a single rule:
html, body, div, p { display: block }
head, title { display: none }
The first rule declares that the <html>,
<body>, <div> and <p>
elements should all be displayed as blocks, each beginning on a
new line.
The second rule declares that the <head> and
<title> elements should not be displayed at all, as they
contain document metadata rather than actual content.
The default value for the display property is "inline", which is
why we did not bother to explicitly specify a value for this property
for the <em> and <strong> elements,
which should be displayed as inline text.
A single rule can also include multiple property declarations, separated by semicolons:
p { margin-top: 12pt; margin-bottom: 12pt }
This rule declares that <p> elements, used to indicate
paragraphs in XHTML, will have top and bottom margins of 12pt.
(Points are a unit of measurement often used in page layout;
there are 72 points to an inch).
Note that there are now two rules that apply to <p>
elements. The first rule provides a value for the display property while
the second provides values for the margin-top and margin-bottom
properties. The ability to combine properties from multiple rules is a
vital part of CSS and will be examined in more detail later.
The CSS rules that have been written so far make a fully functional CSS style sheet that can be used to style a subset of XHTML: (Some comments have been added to demonstrate the syntax for comments in CSS).
/* block elements */
html, body, div, p { display: block }
/* metadata elements */
head, title { display: none }
/* paragraphs */
p { margin-top: 12pt; margin-bottom: 12pt }
/* inline elements */
em { font-style: italic }
strong { font-weight: bold }
This is a remarkably concise and simple specification of XML document style. Indeed, it is hard to imagine how it could be made any simpler.
It could however, be made more complicated; for proof of this we need look no further than XSL.
XSL was born as an attempt to create an XML formatting and presentation language that drew on the heritage of the DSSSL language for SGML.
XSL style sheets work by transforming XML documents into new XML documents that include styling information and can be printed. Styling information is added in the form of presentational attributes with names and values similar to CSS properties.
The syntax and processing model of XSL are both more complicated than those of CSS, as we will see when trying to use XSL to style the example XHTML document from the previous section.
Let us start by writing the following XSL templates for styling the
XHTML <em> and <strong> elements
by transforming them into presentational markup:
<xsl:template match="em">
<fo:inline font-style="italic">
<xsl:apply-templates/>
</fo:inline>
</xsl:template>
<xsl:template match="strong">
<fo:inline font-weight="bold">
<xsl:apply-templates/>
</fo:inline>
</xsl:template>
The first template declares that <em> elements should be
transformed into <fo:inline> elements with a font-style
attribute of "italic". The <xsl:apply-templates> element
is used to recursively process the contents of the original
<em> element. If this was omitted, the contents of the
<em> element would be dropped.
The template will transform this element:
<em>Hello, world!</em>
into this element:
<fo:inline font-style="italic">Hello, world!</fo:inline>
which will then be treated as an inline span of text to be printed with an
italic font. The template for the <strong> element
behaves similarly.
The XML syntax of XSL makes these templates rather clunky and verbose compared to the equivalent CSS rules, but so far they seem reasonable.
XSL has a compact non-XML syntax for selectors, although it differs from the selector syntax used by CSS. This is the method for applying a template to multiple elements in XSL:
<xsl:template match="html|body|div|p">
<fo:block>
<xsl:apply-templates/>
</fo:block>
</xsl:template>
This template transforms the <html>,
<body>, <div> and <p>
elements into blocks, then processes their content. Again, it seems reasonably
similar to the equivalent CSS rule, just more verbose. However, there is a
problem lurking beneath the surface that will be revealed when we attempt to
write the template for handling paragraphs:
<xsl:template match="p">
<fo:block margin-top="12pt" margin-bottom="12pt">
<xsl:apply-templates/>
</fo:block>
</xsl:template>
While this template is straightforward on its own, it conflicts
with the previous template, as both of them apply to <p>
elements.
Each element in the XML document will be matched by exactly one XSL template,
which must provide all of the styling properties for that element.
We will see later that the inability to combine styling properties from
multiple templates is a critical flaw in XSL that makes it unsuitable
for styling.
Taking the templates that have been written so far and adding some other required elements results in a usable XSL style sheet for transforming a subset of XHTML into presentational markup:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="xml"/>
<!-- block elements -->
<xsl:template match="html|body|div">
<fo:block>
<xsl:apply-templates/>
</fo:block>
</xsl:template>
<!-- metadata elements -->
<xsl:template match="head|title"/>
<!-- paragraphs -->
<xsl:template match="p">
<fo:block margin-top="12pt" margin-bottom="12pt">
<xsl:apply-templates/>
</fo:block>
</xsl:template>
<!-- inline elements -->
<xsl:template match="em">
<fo:inline font-style="italic">
<xsl:apply-templates/>
</fo:inline>
</xsl:template>
<xsl:template match="strong">
<fo:inline font-weight="bold">
<xsl:apply-templates/>
</fo:inline>
</xsl:template>
</xsl:stylesheet>
This is a remarkably verbose and complex specification of XML document style. The XSL style sheet is three times longer than the equivalent CSS style sheet and is far more demanding to write.
However, CSS has more advantages over XSL than merely a more concise syntax. The styling-based processing model of CSS allows CSS style sheets to be simpler, more flexible, more modular and more reusable than XSL transformations can ever be.
Unlike XSL templates, it is possible for many CSS rules to apply to the same element. The property declarations from the different rules are weighted and combined to derive a style for the element, and it is this fine-grained approach that makes the CSS styling model so powerful and convenient.
For example, consider extending the sample XHTML document into a legal contract, such as a software license agreement, in which paragraphs and phrases containing legal terms are annotated with a class attribute of "legalese" so that they may be styled appropriately:
<p>
Blah blah blah
<span class="legalese">blah blah blah</span>.
</p>
<p class="legalese">
blah blah blah.
</p>
A single CSS rule will apply to all the elements that have a class attribute of "legalese" and convert the text in them to uppercase:
*.legalese { text-transform: uppercase }
This rule will apply to all of the appropriate elements, independently of any other rules in the style sheet.
Attempting to rewrite this CSS rule as an XSL template demonstrates the restrictions imposed by the transformation-based processing model used by XSL:
<xsl:template match="*[@class='legalese']">
<fo:inline text-transform="uppercase">
<xsl:apply-templates/>
</fo:inline>
</xsl:template>
This template is simple, but wrong. It will transform all elements with
a class of "legalese" into inline text, even elements that should be blocks
such as <p> for paragraphs. Even worse, this template
overrides the template that we wrote earlier for processing paragraphs,
which will result in <p> elements with a class of
"legalese" losing their style.
Making this actually work properly in XSL requires complexity and duplication. One way to do it would be to duplicate the templates for every element:
<xsl:template match="p">
<fo:block margin-top="12pt" margin-bottom="12pt">
<xsl:apply-templates/>
</fo:block>
</xsl:template>
<xsl:template match="p[@class='legalese']">
<fo:block text-transform="uppercase"
margin-top="12pt" margin-bottom="12pt">
<xsl:apply-templates/>
</fo:block>
</xsl:template>
This works, but it is a terrible solution that doubles the length of the style sheet and duplicates the style for every element, leading to a maintenance nightmare. This solution clearly won't scale either: if there is another class or attribute — or even worse, a combination of attributes — the templates will need to be duplicated again and again.
There is another way to do it that requires less duplication, but involves writing the templates in a much more programmatic way, with conditional tests:
<xsl:template match="p">
<fo:block margin-top="12pt" margin-bottom="12pt">
<xsl:if test="@class='legalese'">
<xsl:attribute name="text-transform">
uppercase
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</fo:block>
</xsl:template>
This time we have added an explicit test for the legalese class inside the
template that applies to all <p> elements. The advantage
is that the paragraph style is not duplicated any more. The disadvantage is
that this explicit test for the legalese class must still be added to every
single template, which is a maintenance burden. This burden can be reduced
slightly by refactoring the templates and placing the conditional test
inside a separate named template, which is a construct much like a
function or subroutine in a programming language:
<xsl:template match="p">
<fo:block margin-top="12pt" margin-bottom="12pt">
<xsl:call-template name="check-legalese"/>
<xsl:apply-templates/>
</fo:block>
</xsl:template>
<xsl:template name="check-legalese">
<xsl:if test="@class='legalese'">
<xsl:attribute name="text-transform">
uppercase
</xsl:attribute>
</xsl:if>
</xsl:template>
Now the class attribute is only tested in one place, the "check-legalese" template, which is then invoked from all the other templates in the style sheet.
But what is it that we are actually doing here? This seems more like programming than styling! Instead of focusing on the structure of the documents we wish to style, we are wrestling with the structure of the XSL style sheet, where the style we were trying to express has been buried under a mess of conditional logic and subroutines.
The transformation-based processing model of XSL requires programming skill to use effectively and is ill-suited for styling XML documents. This makes life difficult for designers who wish to print XML documents, as they must work in conjunction with programmers to implement their designs, or are forced to master the complexities of XSL programming themselves in order to do their work.
In contrast, the styling-based processing model of CSS allows designers with no programming background to produce printed output quickly and efficiently, by writing simple declarative rules that specify the desired style.
Although XSL is not a good method for styling XML documents, it is by no means useless. The design of XSL is such that the transformation markup (XSLT) is separate from the presentational markup (XSL-FO), allowing it to be used independently as a general purpose language for document transformation.
XSLT is a convenient language for writing transformation components that can be coupled together to create a document publishing pipeline. For example, one XSLT transform could augment XML documents by generating a table of contents based on the section headings. Another transform could scan for keywords to add to an automatically generated glossary of terms or index.
Once all of the transforms have been applied to the XML document, the final step is to style it with CSS to produce output formatted for print or display. Using XSL for transformation and CSS for styling combines the strengths of both technologies and is a great way to create flexible and maintainable document publishing pipelines.
CSS became popular as a language for styling web pages for display on computer screens, so not everyone has experienced using CSS for printing. Nevertheless, CSS does have good support for printing and like the rest of CSS it is simple to use and easy to learn.
CSS provides style properties for controlling page breaks that occur before, after, or inside an element:
h2 { page-break-before: always }
h3 { page-break-after: avoid }
table { page-break-inside: avoid }
The first rule declares that headings should be preceded by a page break, ensuring that they will be placed at the top of the page. The second rule declares that sub-headings should not be followed by a page break, ensuring that they will not be placed at the bottom of the page. The third rule declares that tables should not contain any page breaks, ensuring that they are not split over more than one page.
CSS provides a special kind of rule for applying style properties to pages. This can be used to specify the page size and orientation:
@page {
size: A4 portrait
}
Page rules can also be used to specify the margins, border, padding and background of printed pages:
@page {
margin: 2.5cm;
border: solid black thin;
padding: 1cm;
background: yellow
}
The @page rule also provides the facility to create running page
headers and footers by adding generated content to the page margins:
@page {
@top { content: "This is a page header" }
@bottom { content: counter(page) }
}
This rule adds a page header at the top of every page and adds the current page number at the bottom of every page.
It is also possible to capture content from the document, such as chapter titles or section headings, in named strings which can be placed in the headers or footers:
@page {
@top { content: string(chapter-title) }
}
h2 { string-set: chapter-title content() }
These rules capture the content of <h2> heading elements
in a named string called "chapter-title", which is used as a page header
by being placed in the page top margin.
CSS even has support for duplex printing, in which left and right facing pages are styled differently in order to create book or magazine style layouts:
@page:left {
@top { content: string(book-title) }
}
@page:right {
@top { content: string(chapter-title) }
}
@page {
@bottom {
content: counter(page);
text-align: outside
}
}
These rules create a book-style layout in which the title of the book is placed at the top of left-hand pages and the title of the current chapter is placed at the top of right-hand pages. The page number is placed at the bottom of every page, aligned to the outside edge: the left side of left-hand pages and the right side of right-hand pages.
It is also possible to create page breaks based on duplex layout, for example to ensure that chapter titles are placed at the top of a left-hand page:
h2 { page-break-before: left }
Other CSS properties with support for duplex printing include text-align, float, clear, margin-inside and margin-outside.
Another advantage of CSS is that it enables multi-channel publishing: using CSS for styling allows XML documents to be published in print or on the web.
CSS has a special @media rule to make multi-channel publishing
easier, by restricting sets of rules to apply on to specific media:
@media print {
* { text-decoration: none }
a[href]::after { content: "(" attr(href) ")" }
}
The first rule disables underlining for all elements while
the second rule adds link URLs in brackets after the link text.
These two rules are grouped within a @media rule, which ensures
that they are only applied when the style sheet is being used to produce
printed output.
Neither of these rules will be applied if the style sheet is being used to
style a document for display in a web browser.
The concise syntax and powerful styling-based processing model of CSS combine to form a technology that is efficient to use and easy to learn.
With support for pagination, running page headers/footers, duplex printing and multi-channel publishing, CSS is an excellent choice for styling and printing XML documents.
CSS is a practical choice for designers looking for a cost-effective XML printing solution that does not require programming. In contrast, XSL requires programming skill to use effectively and is more suited for use as a transformation language than as a styling language.