XTech 2005: XML, the Web and beyond.

Printing XML: Why CSS is better than XSL

Introduction

CSS and XSL are two technology standards from the W3C that can be used to print XML documents.

The two technologies use entirely different processing models and syntax to achieve their goals. However, they have one important similarity: both of them have a page layout model based on the automatic pagination of a continuous flow of text.

This makes them suitable for printing books, contracts, letters and other documents that include arbitrary amounts of flowing text that must be divided over multiple pages. They are less suitable for documents based on a fixed layout, such as magazines and newspapers, in which the layout of each page requires manual intervention.

Since CSS and XSL are both viable contenders for many printing tasks, people who wish to print XML documents are faced with a choice as to which one they should use.

In this paper I argue that CSS is the best and most cost-effective solution for styling and printing XML documents. XSL on the other hand is over-complex and ill-suited for styling XML documents. It should be used sparingly, if at all, for performing document transformations only, in conjunction with CSS styling.

What is CSS?

CSS was born as a styling language for HTML documents on the web and grew to become a styling language for XML in general.

A CSS style sheet contains rules which apply style properties to the elements in an XML document. The style properties applied to each element determine the layout, colors and fonts used when printing that element.

The syntax and processing model of CSS are easy to explain with the help of a few simple examples. Consider trying to print the following XHTML document:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>A simple XHTML document</title>
</head>
<body>
<p>
This is a <em>very</em> simple document, and that
is a <strong>good thing</strong>, don't you agree?
</p>
</body>
</html>

XHTML uses semantic markup rather than presentational markup, so we will need to provide styling information indicating how we wish each element in the document to be formatted.

The <em> and <strong> elements are used in XHTML to indicate emphasis and strong emphasis, respectively. The following two CSS rules declare that text within an <em> element should be printed with an italic font and text within a <strong> element should be printed with a bold font:

em { font-style: italic }
strong { font-weight: bold }

The element name at the start of the rule is a selector used to select the elements to which the style properties will be applied. For convenience, it is possible to use multiple selectors in a single rule:

html, body, div, p { display: block }

head, title { display: none }

The first rule declares that the <html>, <body>, <div> and <p> elements should all be displayed as blocks, each beginning on a new line. The second rule declares that the <head> and <title> elements should not be displayed at all, as they contain document metadata rather than actual content.

The default value for the display property is "inline", which is why we did not bother to explicitly specify a value for this property for the <em> and <strong> elements, which should be displayed as inline text.

A single rule can also include multiple property declarations, separated by semicolons:

p { margin-top: 12pt; margin-bottom: 12pt }

This rule declares that <p> elements, used to indicate paragraphs in XHTML, will have top and bottom margins of 12pt. (Points are a unit of measurement often used in page layout; there are 72 points to an inch).

Note that there are now two rules that apply to <p> elements. The first rule provides a value for the display property while the second provides values for the margin-top and margin-bottom properties. The ability to combine properties from multiple rules is a vital part of CSS and will be examined in more detail later.

The CSS rules that have been written so far make a fully functional CSS style sheet that can be used to style a subset of XHTML: (Some comments have been added to demonstrate the syntax for comments in CSS).

/* block elements */

html, body, div, p { display: block }

/* metadata elements */

head, title { display: none }

/* paragraphs */

p { margin-top: 12pt; margin-bottom: 12pt }

/* inline elements */

em { font-style: italic }
strong { font-weight: bold }

This is a remarkably concise and simple specification of XML document style. Indeed, it is hard to imagine how it could be made any simpler.

It could however, be made more complicated; for proof of this we need look no further than XSL.

What is XSL?

XSL was born as an attempt to create an XML formatting and presentation language that drew on the heritage of the DSSSL language for SGML.

XSL style sheets work by transforming XML documents into new XML documents that include styling information and can be printed. Styling information is added in the form of presentational attributes with names and values similar to CSS properties.

The syntax and processing model of XSL are both more complicated than those of CSS, as we will see when trying to use XSL to style the example XHTML document from the previous section.

Let us start by writing the following XSL templates for styling the XHTML <em> and <strong> elements by transforming them into presentational markup:

<xsl:template match="em">
    <fo:inline font-style="italic">
        <xsl:apply-templates/>
    </fo:inline>
</xsl:template>

<xsl:template match="strong">
    <fo:inline font-weight="bold">
        <xsl:apply-templates/>
    </fo:inline>
</xsl:template>

The first template declares that <em> elements should be transformed into <fo:inline> elements with a font-style attribute of "italic". The <xsl:apply-templates> element is used to recursively process the contents of the original <em> element. If this was omitted, the contents of the <em> element would be dropped.

The template will transform this element:

<em>Hello, world!</em>

into this element:

<fo:inline font-style="italic">Hello, world!</fo:inline>

which will then be treated as an inline span of text to be printed with an italic font. The template for the <strong> element behaves similarly.

The XML syntax of XSL makes these templates rather clunky and verbose compared to the equivalent CSS rules, but so far they seem reasonable.

XSL has a compact non-XML syntax for selectors, although it differs from the selector syntax used by CSS. This is the method for applying a template to multiple elements in XSL:

<xsl:template match="html|body|div|p">
    <fo:block>
        <xsl:apply-templates/>
    </fo:block>
</xsl:template>

This template transforms the <html>, <body>, <div> and <p> elements into blocks, then processes their content. Again, it seems reasonably similar to the equivalent CSS rule, just more verbose. However, there is a problem lurking beneath the surface that will be revealed when we attempt to write the template for handling paragraphs:

<xsl:template match="p">
    <fo:block margin-top="12pt" margin-bottom="12pt">
        <xsl:apply-templates/>
    </fo:block>
</xsl:template>

While this template is straightforward on its own, it conflicts with the previous template, as both of them apply to <p> elements. Each element in the XML document will be matched by exactly one XSL template, which must provide all of the styling properties for that element. We will see later that the inability to combine styling properties from multiple templates is a critical flaw in XSL that makes it unsuitable for styling.

Taking the templates that have been written so far and adding some other required elements results in a usable XSL style sheet for transforming a subset of XHTML into presentational markup:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:fo="http://www.w3.org/1999/XSL/Format">

<xsl:output method="xml"/>

<!-- block elements -->

<xsl:template match="html|body|div">
    <fo:block>
        <xsl:apply-templates/>
    </fo:block>
</xsl:template>

<!-- metadata elements -->

<xsl:template match="head|title"/>

<!-- paragraphs -->

<xsl:template match="p">
    <fo:block margin-top="12pt" margin-bottom="12pt">
        <xsl:apply-templates/>
    </fo:block>
</xsl:template>

<!-- inline elements -->

<xsl:template match="em">
    <fo:inline font-style="italic">
        <xsl:apply-templates/>
    </fo:inline>
</xsl:template>

<xsl:template match="strong">
    <fo:inline font-weight="bold">
        <xsl:apply-templates/>
    </fo:inline>
</xsl:template>

</xsl:stylesheet>

This is a remarkably verbose and complex specification of XML document style. The XSL style sheet is three times longer than the equivalent CSS style sheet and is far more demanding to write.

However, CSS has more advantages over XSL than merely a more concise syntax. The styling-based processing model of CSS allows CSS style sheets to be simpler, more flexible, more modular and more reusable than XSL transformations can ever be.

How is the CSS styling model superior?

Unlike XSL templates, it is possible for many CSS rules to apply to the same element. The property declarations from the different rules are weighted and combined to derive a style for the element, and it is this fine-grained approach that makes the CSS styling model so powerful and convenient.

For example, consider extending the sample XHTML document into a legal contract, such as a software license agreement, in which paragraphs and phrases containing legal terms are annotated with a class attribute of "legalese" so that they may be styled appropriately:

<p>
    Blah blah blah
    <span class="legalese">blah blah blah</span>.
</p>
<p class="legalese">
    blah blah blah.
</p>

A single CSS rule will apply to all the elements that have a class attribute of "legalese" and convert the text in them to uppercase:

*.legalese { text-transform: uppercase }

This rule will apply to all of the appropriate elements, independently of any other rules in the style sheet.

Attempting to rewrite this CSS rule as an XSL template demonstrates the restrictions imposed by the transformation-based processing model used by XSL:

<xsl:template match="*[@class='legalese']">
    <fo:inline text-transform="uppercase">
        <xsl:apply-templates/>
    </fo:inline>
</xsl:template>

This template is simple, but wrong. It will transform all elements with a class of "legalese" into inline text, even elements that should be blocks such as <p> for paragraphs. Even worse, this template overrides the template that we wrote earlier for processing paragraphs, which will result in <p> elements with a class of "legalese" losing their style.

Making this actually work properly in XSL requires complexity and duplication. One way to do it would be to duplicate the templates for every element:

<xsl:template match="p">
    <fo:block margin-top="12pt" margin-bottom="12pt">
        <xsl:apply-templates/>
    </fo:block>
</xsl:template>

<xsl:template match="p[@class='legalese']">
    <fo:block text-transform="uppercase"
    	margin-top="12pt" margin-bottom="12pt">
        <xsl:apply-templates/>
    </fo:block>
</xsl:template>

This works, but it is a terrible solution that doubles the length of the style sheet and duplicates the style for every element, leading to a maintenance nightmare. This solution clearly won't scale either: if there is another class or attribute — or even worse, a combination of attributes — the templates will need to be duplicated again and again.

There is another way to do it that requires less duplication, but involves writing the templates in a much more programmatic way, with conditional tests:

<xsl:template match="p">
    <fo:block margin-top="12pt" margin-bottom="12pt">
        <xsl:if test="@class='legalese'">
    	<xsl:attribute name="text-transform">
    	    uppercase
    	</xsl:attribute>
        </xsl:if>
        <xsl:apply-templates/>
    </fo:block>
</xsl:template>

This time we have added an explicit test for the legalese class inside the template that applies to all <p> elements. The advantage is that the paragraph style is not duplicated any more. The disadvantage is that this explicit test for the legalese class must still be added to every single template, which is a maintenance burden. This burden can be reduced slightly by refactoring the templates and placing the conditional test inside a separate named template, which is a construct much like a function or subroutine in a programming language:

<xsl:template match="p">
    <fo:block margin-top="12pt" margin-bottom="12pt">
        <xsl:call-template name="check-legalese"/>
        <xsl:apply-templates/>
    </fo:block>
</xsl:template>

<xsl:template name="check-legalese">
    <xsl:if test="@class='legalese'">
        <xsl:attribute name="text-transform">
    	uppercase
        </xsl:attribute>
    </xsl:if>
</xsl:template>

Now the class attribute is only tested in one place, the "check-legalese" template, which is then invoked from all the other templates in the style sheet.

But what is it that we are actually doing here? This seems more like programming than styling! Instead of focusing on the structure of the documents we wish to style, we are wrestling with the structure of the XSL style sheet, where the style we were trying to express has been buried under a mess of conditional logic and subroutines.

The transformation-based processing model of XSL requires programming skill to use effectively and is ill-suited for styling XML documents. This makes life difficult for designers who wish to print XML documents, as they must work in conjunction with programmers to implement their designs, or are forced to master the complexities of XSL programming themselves in order to do their work.

In contrast, the styling-based processing model of CSS allows designers with no programming background to produce printed output quickly and efficiently, by writing simple declarative rules that specify the desired style.

What is XSL good for?

Although XSL is not a good method for styling XML documents, it is by no means useless. The design of XSL is such that the transformation markup (XSLT) is separate from the presentational markup (XSL-FO), allowing it to be used independently as a general purpose language for document transformation.

XSLT is a convenient language for writing transformation components that can be coupled together to create a document publishing pipeline. For example, one XSLT transform could augment XML documents by generating a table of contents based on the section headings. Another transform could scan for keywords to add to an automatically generated glossary of terms or index.

Once all of the transforms have been applied to the XML document, the final step is to style it with CSS to produce output formatted for print or display. Using XSL for transformation and CSS for styling combines the strengths of both technologies and is a great way to create flexible and maintainable document publishing pipelines.

Printing with CSS

CSS became popular as a language for styling web pages for display on computer screens, so not everyone has experienced using CSS for printing. Nevertheless, CSS does have good support for printing and like the rest of CSS it is simple to use and easy to learn.

Page breaks

CSS provides style properties for controlling page breaks that occur before, after, or inside an element:

h2 { page-break-before: always }

h3 { page-break-after: avoid }

table { page-break-inside: avoid }

The first rule declares that headings should be preceded by a page break, ensuring that they will be placed at the top of the page. The second rule declares that sub-headings should not be followed by a page break, ensuring that they will not be placed at the bottom of the page. The third rule declares that tables should not contain any page breaks, ensuring that they are not split over more than one page.

Page style

CSS provides a special kind of rule for applying style properties to pages. This can be used to specify the page size and orientation:

@page {
    size: A4 portrait
}

Page rules can also be used to specify the margins, border, padding and background of printed pages:

@page {
    margin: 2.5cm;
    border: solid black thin;
    padding: 1cm;
    background: yellow
}

Page headers and footers

The @page rule also provides the facility to create running page headers and footers by adding generated content to the page margins:

@page {
    @top { content: "This is a page header" }
    @bottom { content: counter(page) }
}

This rule adds a page header at the top of every page and adds the current page number at the bottom of every page.

It is also possible to capture content from the document, such as chapter titles or section headings, in named strings which can be placed in the headers or footers:

@page {
    @top { content: string(chapter-title) }
}

h2 { string-set: chapter-title content() }

These rules capture the content of <h2> heading elements in a named string called "chapter-title", which is used as a page header by being placed in the page top margin.

Duplex printing

CSS even has support for duplex printing, in which left and right facing pages are styled differently in order to create book or magazine style layouts:

@page:left {
    @top { content: string(book-title) }
}

@page:right {
    @top { content: string(chapter-title) }
}

@page {
    @bottom {
	content: counter(page);
	text-align: outside
    }
}

These rules create a book-style layout in which the title of the book is placed at the top of left-hand pages and the title of the current chapter is placed at the top of right-hand pages. The page number is placed at the bottom of every page, aligned to the outside edge: the left side of left-hand pages and the right side of right-hand pages.

It is also possible to create page breaks based on duplex layout, for example to ensure that chapter titles are placed at the top of a left-hand page:

h2 { page-break-before: left }

Other CSS properties with support for duplex printing include text-align, float, clear, margin-inside and margin-outside.

Multi-channel publishing with CSS

Another advantage of CSS is that it enables multi-channel publishing: using CSS for styling allows XML documents to be published in print or on the web.

CSS has a special @media rule to make multi-channel publishing easier, by restricting sets of rules to apply on to specific media:

@media print {
    * { text-decoration: none }

    a[href]::after { content: "(" attr(href) ")" }
}

The first rule disables underlining for all elements while the second rule adds link URLs in brackets after the link text. These two rules are grouped within a @media rule, which ensures that they are only applied when the style sheet is being used to produce printed output. Neither of these rules will be applied if the style sheet is being used to style a document for display in a web browser.

Conclusion

The concise syntax and powerful styling-based processing model of CSS combine to form a technology that is efficient to use and easy to learn.

With support for pagination, running page headers/footers, duplex printing and multi-channel publishing, CSS is an excellent choice for styling and printing XML documents.

CSS is a practical choice for designers looking for a cost-effective XML printing solution that does not require programming. In contrast, XSL requires programming skill to use effectively and is more suited for use as a transformation language than as a styling language.