XTech 2005: XML, the Web and beyond.
This presentation addresses a low-effort-required solution for users looking to take a step into XML for their technical documentation.
The Darwin Information Typing Architecture (DITA) and its associated public toolkit provide you with the DTDs, stylesheets and other tools you require to make your steps into XML.
But DITA is much more. The DITA specialization process gives you the ability to easily adapt DITA to your company requirements to provide value for various document types.
DITA is integrated into and actively supported by many state-of-the-art tools commonly known in the XML world like EPIC, XMLSpy, Serna, WorldServer OpenTopic, Content Mapper and many others to come.
The presentation contains:
a brief introduction into DITA, an example of how the specialization works, a case study and FAQs like: what would be the next step I have to take after this conference, where can I get support, what will I need.
DITA - Darwin Information Typing Architecture - is an open source, XML based architecture for production and publication of technical documentation.
Within this presentation paper you will find high level as well as technical information provided for implementers. If you are interested in high level information only, skip the chapters marked with a (t)
Do you feel constrained by standardized DTDs?
Will writing your own DTDs and stylesheet blast your budget?
Then DITA might be the way to your successful XML project.
"Yes" or actually "will be." In April 2005 the OASIS DITA 1.0 Committee Draft has been approved by the DITA Technical Committee for submission to OASIS for a membership standardization vote.
The DITA core consists of a topic DTD and a map DTD. The topic represents a unit of information for a single subject. The topics can then be assembled in a map file e.g. for a help system or books that require a particular selection and organization of subjects.
Beside of the core topic, three specialized topics are part of the DITA core, "reference" to describe regular features e.g. of a instrument, "task" to describe the steps of a particular task, e.g. of an installation procedure and "concept" to introduce the background or overview information for tasks or reference topics.
Each topic type contains a prolog session for holding meta information about the information unit and about the product, helpful attributes especially when working with content management systems.
Extensibility with inheritance, which allows the creation of new types that inherit processing rules from existing types. For example, API documentation is a particular kind of reference information and requires more specific rules and descriptive markup than a generic reference type. DITA lets you define a new type and reuse the processing of the base type (providing new processing only for different requirements of the new type). As a result, topics from different domains with different markup and markup rules can be built together into one help file, Web site, or book.
The DITA toolkit is much more then only some DTDs and Schemas, it provides you with a lot of supporting files like already prepared stylesheets to transfer DITA based XML documents into HTML, PDF, or into other standards like DocBook. A very helpful section within the DITA toolkit is the demo section. It consists of predefined specializations for a various set of applications, e.g. the bookmap containing many already defined elements required for a book structure, like cover, preface, chapters, appendix etc.
Darwin is all about "inheritance" and "specialization" in biological evolution, and that is what DITA is all about in documentation.
The Darwin Information Typing Architecture was created by IBM in 1999-2000. In March 2004 the technology was contributed to OASIS for further development within a new OASIS Technical Committee (TC), chartered to promote the use of the DITA architecture for creating standard information types and domain-specific markup vocabularies. The members of the TC are representatives from different industries, organizations and countries.
The presentation will provide you with a short theoretical overview of what the specialization is, as well as of a short live presentation.
The three topic types "reference", "task" and "concept" are all specializations derived from "topic". E.g. looking more in detail to "task" you will find an element called "steps" and within "steps" the element "step". "Steps have been derived from the topic element "ol" and "step" is just the "li", it is as simple as it is.
If you are going deeper and you are starting your own specialization, you may have to specialize "steps" into "mysteps" and you can limit or enhance what should be in "mysteps" compared to "steps" or even "ol". For how the specialization works more in detail, check the documents provided by the DITA TC at http://www.dita-ot.sourceforge.net.
And the crown jewels of DITA are now coming in action. If the appearance of "mysteps" within your PDF or HTML rendition will be the same as of "steps", you won't have to change anything in the stylesheets provided by the DITA toolkit. And if you require a different look for "mystep", just copy the definitions and modify them as you desire.
As the session itself can cover high level information about DITA and CMS only, we added some further information about that issue into this paper to provide you with more background information.
It is definitely possible to create sophisticated DITA-based XML data structures based on the file system alone, and publish them to multiple channels with impressing results. DITA has powerful support for standard and custom metadata that can be associated with topics - which are the fundamental DITA information units for building various types of documents. Looking at the output of the publishing process alone, one could not tell whether the authoring process made use of a database back-end.
The intentions leading to use of databases in real-life technical authoring scenarios are different. Whenever the complexity or the size of technical documentation exceeds a certain level, the need for more organization arises. Beyond a certain critical mass, the authoring of complex, modular, interwoven documentation consisting of many, small modules reused in multiple contexts, some of them already published, others still in creation, and still others already outdated, can no longer be handled without some sort of unified organization, automation and protection of the processes and the data, respectively.
That means, one of the major aspects that have to be dealt with when DITA is to be used aiming at optimization of an organization's technical documentation is: integration of DITA and a database. It is a helpful fact that DITA design and the merits of a database point roughly in the same direction. Both encourage modularization, both help us organizing metadata.
There are multiple types of databases, each with both proprietary and open source products available, that can be reasonably considered as the back-end of an authoring system:
We will focus on file-based CMS with rather high-level functionality here, but the concepts apply to all types of systems. Depending on the product selection, functionality may or may not be contained. Product selection is therefore a crucial step. Functionality that comes out of the box and can be customized does not need to be implemented, and is often (but not always) more reliable than custom software.
Let us now take a look at the functionality that we need to build an XML authoring system. Naturally, we are particularly interested in the connotations the functionality requirements have with DITA. In this section, we will not go into too much detail how the functionality is supported by the database. We will rather name the requirements and leave open whether the functionality comes out of the box or is added by some means.
If we store XML data in a file-based CMS, initially all the data is in the XML files, there is no obvious distinction between content and metadata. A CMS of that type is normally not capable of looking into the XML data (except by means of a full-text search). Therefore, if we want to make part of the XML data accessible in the CMS, we have to extract the data we need from the XML file and let the CMS store it. Since that data can also be manipulated on CMS side, we normally have the requirement of making this process bidirectional, so that the CMS-stored data replaces the associated data in the XML file on outbound operations.
There is no clear a-priori distinction between content and metadata; rather we have to decide what data we extract from the XML file when we set up the system. That data becomes our metadata by definition. This definition could change from one application to another, even with the same data.
Since the metadata we want to extract is normally not identical for all types of data, we want to have the possibility to differentiate between categories of data with different sets of attributes. Let's call these categories "object types" and the process of bidirectional metadata transfer "attribute mapping".
In a typical file-based CMS, only the XML file as a whole can be accessed. If we need a single word, we still have to fetch the entire file from the CMS and extract the word in the file system. If we want to enable reuse of smaller portions - perhaps DITA topics - we need to cut the XML file apart on inbound and assemble it on outbound operations.
Since we do not want to expose this job to the users, we should expect that the CMS is capable of doing so automatically. Indeed, this functionality is not uncommon among CMS, and it is called "chunking".
Chunking and attribute mapping can be seen as the crucial functions that turn an ordinary, file-based CMS into an XML-enabled CMS.
At times when the technical documentation of a machine or other device consisted of a few word processor files and images at best, stored in a folder, without reuse, without lifecycle management, a technical author was normally aware of the operations that were allowed at any time.
In contrast, the fine granularity and strong reuse XML based authoring, and particularly DITA, bring into technical authoring, make it almost impossible for a technical writer to intuitively understand what the valid operations on a given module are. If it is under construction, it may (and should) be overwritten, if it is already published, it may not be overwritten but should be versioned. If it is outdated, then it should be archived and perhaps deleted from the live system. The complexity even grows, if multiple departments manage their data in the same repository and users may reference but not manipulate data from other departments.
One of the core targets of DITA is encouraging data reuse and single-sourcing. If we want the authors to reuse existing data, we have to give them tools that help finding these data - that means, we need search functionality for both metadata (which is extracted and stored in tables) and full-text data, or content (which is not extracted and therefore not stored in tables).
Lifecycle management is based on the idea, that every piece of data goes through several stages of evolution during the time it is stored in a digital system. At first, it is in some sort of authoring stage, where it still has some errors and is not yet complete. Such data will not be suitable for being published. Later, after completion, it is being reviewed, the bugs are - hopefully - found and removed, and it becomes released and eventually published. Even later, the product it describes is discontinued and no longer supported, so the associated documentation becomes retired, and the data will perhaps be archived and deleted from the life system.
A CMS with lifecycle management support provides a formalized mechanism to assign such stages to objects and associate functionality (such as changing permissions or storage location) with them.
A workflow system provides a tool for digital modeling of business processes, typically in a graphical way. The workflow system will provide a framework in which authors and other contributors get notified when there is work for them to do. On the other hand, there are normally quite a few tasks that can be fully automated, as generating renditions for review and publication, translation management support, and others. A well-designed workflow framework can speed up the documentation production significantly, while increasing consistency, security and convenience of use.
If you are starting a project to introduce XML in your technical documentation environment, if you have to introduce a new CMS or a whole new authoring/publishing environment, there are many things you will have to take care of. With this case study we would like to provide you with a few hints of how you may proceed.
One of the most important issues when starting such a project, be aware that it may take far more then a year to accomplish everything you plan. But at the same time be aware that you might not be successful if you try and implement all of it in one big package. Split your project into smaller chunks which can be accomplished within 3 - 4 months. If your package takes longer just imagine that within a year the tools and requirements may have already changed - and if you are dealing with internal or external customers, they had enough time to learn new things and may have shifted their expectations.
Stay in your track to your destination point at the horizon, but focus on the expectations of the person or group who is financing you. It will not help anyone if you have good ideas but you are running out of budget.
It may not fully protect you from making a wrong decision, but a good specification helps you to limit the number of surprises you will face in your project. And it also helps to protect your work. Just ensure you have good specifications which are widely supported by a bigger community and especially by the one who finances your project.
Check for existing cases. Look for people and companies who have already implemented a DITA solution. Do not focus on the license cost only, focus on the total cost of ownership incl. required customization, available support, vendors reliability.
If you are the one implementing the new system, avoid being the one selecting the tools. Otherwise you will be blamed for any tool limitations - and they will be there.
Keep in touch with the DITA user forum. There you get lots of good advises.
And finally check for good consultancy. A consultant should have a background in technical documentation - not in IT only - and check for proven experiences in similar projects.
The implementation phase consists of:
CMS Application Design,
The concepts sketched above are being applied to the real world by the authors of this paper. This case study deals with an XML- and CMS-based system for information-centered authoring and cross-media publishing of technical documentation and related information, e. g. for sales and marketing. One of the first decisions made was the one in favor of DITA. Also, the use of Documentum Content Server as repository was a customer-side requirement.
The starting point of the solution design was DITA. The XML data structure and the metadata was modeled first, but not into too much detail, since it was essential that the data was compatible with the XML data handling capabilities of Documentum. XML handling is configurable in Documentum by XML Applications which are XML files describing the behavior of the system when XML data travels into or out of the repository. There are many things that can be configured in an XML Application, perhaps the most important features are chunking / link handling, attribute mapping and determination of storage location in the repository.
While the XML handling capabilities of Documentum and probably most other CMS are constantly evolving at the present, there are in contrary some features that are obviously foreseen to be added later, but are not yet fully implemented. Therefore, the DITA information design was also influenced by restrictions imposed by the CMS. The opposite is true as well: some ambitious metadata value assistance ideas that would have been easily manageable with Documentum were simplified, because there was no way to express them in a DTD. As a consequence, DITA design and CMS data modeling can not be done sequentially, but must be integrated. The isolated description in the two following sections should not invoke the impression of a sequence.
When DITA and CMS data modeling is done, there are some design steps on the side of the CMS. But these have no strong connotations with the DITA structure. These are mostly: permissions, lifecycles and workflow.
DITA is already providing a good set of topic types which can be used for many different requirements. And the provided domains contain a good set of vocabulary especially for software documentation.
But you may require more specific vocabulary and other topic types reflecting your required information structure. And you may have special requirements for metadata which are held in the prolog section of your topics.
Check first for available specification within the DITA open toolkit. E.g. if you are looking for a map specialized for structuring paper based documentation, then you will find a specialized map called bookmap. There might be other existing and "available for free" specializations who match your requirements or who might be a good starting point for your specialization.
In our case we found many good ideas within the bookmap and bkinfo DTDs.
You may also think of global industrial standards for your business environment. E.g. you are dealing with different suppliers or OEM partners dealing with other companies in your business area, DITA could become a good starting point for a standardized DTD in your business area. Think of it as a wider opportunity.
DITA specializations may also become a group wide standard within your corporation. Your divisions may require their own specializations to meet their specific requirements. But if they all base on a group wide standard, they become easily portable. So in our case.
Your author may expect certain element names and your information architects require a certain documentation structure, so check for what you need first.
After you have selected a good base topic type (maybe an already specialized type) modify it according to the DITA guidelines you get from the DITA open toolkit. It will provide you with the information structure you require.
Add domains to get vocabulary throughout several topic types. Check the provided information available at: IBM developerWorks (http://www.ibm.com/developerworks) I t helps you setting up the basic environment and how to make your own specialization work.
Documentum object types basically represent sets of attributes and definitions for their behavior. Therefore, choosing an object type when importing data determines the available metadata. A specific object type should therefore be defined for all objects that have the same set of metadata. In our application, we can make a clear distinction between the metadata requirements of the following types of objects:
Other auxiliary types or subtypes of the ones above can be thought of.
Most roles in an authoring system for technical documentation are related to a specific object type. There are authors whose activity is focused on the XML module type, reviewers and documentation coordinators work on book level, illustrators work with images etc. In many cases there are also groups for translation issues. The first step in creation of a permission model is therefore: grouping the users who basically do the same work. In most cases, there will be some roles of global relevance in addition. Examples are: administrators, managers who need broad read access for reporting purposes and translation managers.
Often, especially in smaller documentation departments, few people are responsible for many tasks, and therefore, often deal with multiple object types. In such cases, it makes sense to split up the roles by object type. This makes assigning permissions and workflow tasks to them much more obvious.
When the roles have been determined, the next step should be the design of the lifecycle system. There should be one lifecycle for every object type (at least for the ones that become part of the documentation). If the system is used for multi-language data and has therefore translation support, the lifecycles for all object types should be split up in individual source and target language lifecycles. This facilitates implementing the permissions for the authors vs. the translators.
The lifecycles are designed to have one stage for each phase an object goes through during its lifetime. In most cases, a structure like the following is a good start:
The permission sets required are easily determined by setting up a table of the states of all lifecycles by all user groups / roles, and determining for each cell, what the permissions of the particular group in the particular lifecycle state should be. All group permissions per lifecycle state are combined to access control lists which are assigned by the lifecycle. By that means, it is ensured that during the whole lifetime of a document the appropriate permissions for all user groups are assigned.
The above components of a Documentum application, in combination with the DITA data model and some publishing means to the required target formats (such as PDF and HTML) are in principle sufficient to take advantage of the power of Documentum with DITA. However, most tasks have to be performed manually. This is much safer than it is in the file system, but it is not any more convenient. Such a stage of an authoring system project, namely, all but the workflow, is, by the way, an ideal half-way project phase. The system is fully usable and can be utilized for documentation creation (at least by a small pilot user group with enhanced CMS understanding). If practical use suggests optimizations and enhancements to the data model, they can be implemented before the workflow system is set up, which saves some effort.
Workflow do not offer any additional document management functions, but they provide a framework for synchronous organization of tasks that have to be done in a specific order by certain users. There are also delegation and deadline monitoring features. If it is a user's turn to perform a task on an object (perhaps, edit an XML module), the user receives a notification in the inbox and is asked to perform what is required. When finished, the user closes the task and the workflow proceeds to the next user. Tasks can also be performed automatically. That means, when the workflow reaches a certain task, the system executes a program that does something useful (standard things like lifecycle promote, or custom operations such as setting up structures from templates).
The design of the workflow system is normally based on a use case analysis of the up-to-present (often manual) processes in the documentation department. It is a good idea to not overthrow everything in favor of something totally new to keep user acceptance on a high level. It makes very much sense to work out these design parameters in close cooperation with the subsequent users (perhaps in a workshop). In most cases, it is feasible to find a workflow design that takes good advantage of the possibilities of DITA and a CMS, but still has some resemblance with the previous processes. Many things can be automated, others can be dropped or simplified, but the process framework should be meaningful to the users.
Since a workflow can initiate another workflow, the process model can be highly modular. Starting a book revision might let the book author perform some structure-related tasks such as adding a new chapter, changing a module or setting up a book from scratch. Then, the system automatically starts a sub workflow for the module and image tasks that were assigned to the authors. When finally all components are finished, the review and release process for the book can be started.
The rendition part might be one of the most time consuming parts of your project. The DITA open toolkit provides you with many sophisticated stylesheets you can use to get immediate results. But that part who may cause a lot of work will be to bring all involved departments into one line, maybe the information designer, technicians, field personnel, marketing and CI responsible do not always have the same focus on how a rendition must look like. Be aware of that.
Implementing a rendition you may use the provided stylesheet and adapt them to your requirements or use one of the available tools to generate your own stylesheets. If your rendition requirement is close to what is provided by the DITA open toolkit, make your modifications there. If it is totally different you may be more successful if you create it yourself. But you may still copy pieces of provided code if it helps. But thinking of being more close to the pro vided renditions, you may safe a lot of money.
Changing the work environment may annoy your authors. You may have heard sentences like - we have already changed the software 3 years ago and we now start to become efficient, - or - I do not like elements called "p", I am used to "para". Of if your authors have worked in less structured programs, they may absolutely upset of no longer being the one who style the document pages in his or her artistic merits.
But even when you do not face such resistance, you must support your personnel with adequate guidance. It will change the way they have to work. To gain the results you expect from your new system, provide your people with a good training and a good authoring guideline to make them as efficient you would like them to be.
If you are involved in a project to introduce XML, a new CMS, or in general a new authoring environment, you will face many time and budget consuming issues and challenges - if there is anything cool and you get it for free, then take it.
With DITA you will be part of a community, there are already many people around using DITA based solutions. You may decide to do something your alone, feel free to do it - but if you prefer to be a part of a growing community sharing experiences, supporting each other - you may be more happy and more successful with your project.
Check out these resources for more information about DITA:
OASIS DITA Technical Committee:http://www.oasis-open.org/committees/dita: See the latest committee draft specifications, DTDs, and schemas, and learn more about plans for future DITA development.
Cover Pages http://xml.coverpages.org/dita.html: See a full list of DITA resources, news items, and articles.
Christian Kravogel
Independent Consultant, SeicoDyne GmbH http://www.seicodyne.com
Christian Kravogel graduated as electrical engineer at the University of Applied Science in Lucerne, Switzerland and as Executive Master of Business Studies at the School of Business Lucerne. After working as technical author and head of the documentation department of an international corporation for several years, he started his work as consultant for technical documentation and XML. As member of the DITA Technical Committee of OASIS, he supports companies introducing DITA as well as within different other technical documentation and translation issues.
Boris Horner
Independent Consultant, Dr.-Ing. Boris Horner http://www.horner-project.de
Boris Horner graduated in physics (Dipl.-Phys.) at the University of Hamburg and received a doctorate (Dr.-Ing.) in mechanical engineering at the Technical University of Karlsruhe (both in Germany). After some years of work in the field of sensor development, he focused on consulting, software design and development in the area of technical documentation and translation. After several years as a Senior Consultant at a major technical documentation service provider, he has founded his own company in 2003 and provides various kinds of services including design, implementation and roll-out of state-of-the-art documentation systems from scratch.