What’s your flavour?: Selecting the right XML for your content

This is a guest post from Emily Gibson and Nic Gibson. They are both directors of Corbas Consulting Ltd and each have over 20 years’ publishing experience, mostly in editorial, print and digital production.

The other day we were contacted by a client who was really excited about the new digital publishing process they were putting into place, and they wanted some help getting things right. They had bought a database and needed to get their Word documents into the XML language that their database needed. However, the ‘flavour’ of XML that they had chosen wasn’t going to support the content that they were producing. That means that they aren’t going to get the best results and full value from their workflow system.

You see, publishing with XML is not just a matter of deciding to ‘have an XML workflow’. (For a basic description for editors, see, for example, this one in The Chicago Manual of Style.) There are many different ‘flavours’ of XML and you need to pick the one that fits your needs. These needs are defined by the type of content you are publishing and your workflow.

A well styled Word document, for example, can be transformed into a decent XML file. Once you have an XML file, you could simply apply scripts to it to create your output (PDF, HTML for your website, EPUB) – if you have a Word document for your novel, for example, that had Word Styles consistently applied, you can simply run a program to get whatever output you need.

On the other hand, if you have a bunch of journal articles, you could save the files into an XML-aware database and apply those scripts to all your content at once to produce a collection. Whichever system you choose is partly driven by the degree of automation that fits your publishing needs. If you are publishing fifteen monographs a year, there’s not a lot of benefit to an all-singing and all-dancing XML database. If you are publishing several hundred articles a year then there are some big benefits.

The first step is to decide how you are going to go from manuscript to XML and then you need to decide what systems you are going to use to manipulate and transform it.

You need to think about both the structure and the content of your manuscripts when you decide on which flavour of XML you are going to choose. The different flavours of XML are very different in their structure and their expressiveness. The only thing you can be fairly sure of is that there is already one which will meet most of your needs (you don’t need to write it from scratch).

There are different tag sets (the set of elements in an XML language, a.k.a. what’s inside the pointy brackets) that suit different kinds of content, and there are different tools to suit them, too. In the same way that ice cream and sausages are both delicious, but you wouldn’t want them together, not every flavour of XML goes with every kind of content.

Match the content you publish with the appropriate XML language. For example:

Simple Narrative

If your content doesn’t have specialised semantics (e.g. legal, programming), the XML variant of HTML5, XHTML (as used in EPUB) is perfectly suitable for a lot of narrative (e.g. novels) and monograph material for EPUB, print and web outputs. The XML variant of HTML5 has the advantage of a smaller, simpler tag set, which makes it easier to work with for simpler content.

XML can help publishers tackle managerial as well as technical challenges. It provides ways to manage the workflow, the interaction between content and people, and the publishing processes, as well as the documents themselves. The features of XML ensure that information and its structure can be controlled and managed.

