What’s your flavour?: Selecting the right XML for your content

AdobeStock 92350022

This is a guest post from Emily Gibson and Nic Gibson. They are both directors of Corbas Consulting Ltd and each have over 20 years’ publishing experience, mostly in editorial, print and digital production.

Chocolate, vanilla, matcha and strawberry ice cream in the cone on old rustic wooden vintage background.

The other day we were contacted by a client who was really excited about the new digital publishing process they were putting into place, and they wanted some help getting things right. They had bought a database and needed to get their Word documents into the XML language that their database needed. However, the ‘flavour’ of XML that they had chosen wasn’t going to support the content that they were producing. That means that they aren’t going to get the best results and full value from their workflow system.

You see, publishing with XML is not just a matter of deciding to ‘have an XML workflow’. (For a basic description for editors, see, for example, this one in The Chicago Manual of Style.) There are many different ‘flavours’ of XML and you need to pick the one that fits your needs. These needs are defined by the type of content you are publishing and your workflow.

A well styled Word document, for example, can be transformed into a decent XML file. Once you have an XML file, you could simply apply scripts to it to create your output (PDF, HTML for your website, EPUB) – if you have a Word document for your novel, for example, that had Word Styles consistently applied, you can simply run a program to get whatever output you need.

On the other hand, if you have a bunch of journal articles, you could save the files into an XML-aware database and apply those scripts to all your content at once to produce a collection. Whichever system you choose is partly driven by the degree of automation that fits your publishing needs. If you are publishing fifteen monographs a year, there’s not a lot of benefit to an all-singing and all-dancing XML database. If you are publishing several hundred articles a year then there are some big benefits.

The first step is to decide how you are going to go from manuscript to XML and then you need to decide what systems you are going to use to manipulate and transform it.

You need to think about both the structure and the content of your manuscripts when you decide on which flavour of XML you are going to choose. The different flavours of XML are very different in their structure and their expressiveness. The only thing you can be fairly sure of is that there is already one which will meet most of your needs (you don’t need to write it from scratch).

There are different tag sets (the set of elements in an XML language, a.k.a. what’s inside the pointy brackets) that suit different kinds of content, and there are different tools to suit them, too. In the same way that ice cream and sausages are both delicious, but you wouldn’t want them together, not every flavour of XML goes with every kind of content.

Match the content you publish with the appropriate XML language. For example:

Simple Narrative
Pedagogical
Legal
Encyclopedic
Journals

If your content doesn’t have specialised semantics (e.g. legal, programming), the XML variant of HTML5, XHTML (as used in EPUB) is perfectly suitable for a lot of narrative (e.g. novels) and monograph material for EPUB, print and web outputs. The XML variant of HTML5 has the advantage of a smaller, simpler tag set, which makes it easier to work with for simpler content.

XML can help publishers tackle managerial as well as technical challenges. It provides ways to manage the workflow, the interaction between content and people, and the publishing processes, as well as the documents themselves. The features of XML ensure that information and its structure can be controlled and managed.

It can be a complex topic, but many publishing professionals find that knowing about XML – even if they don’t use it every day – is immensely useful. There are a number of places that you can learn about XML, but the XML Summer School, held each year in September, is the best and most comprehensive. It presents a range of XML techniques and applications in workflow, change management, QA, linked data, and document structure control to help publishers manage their content effectively.

The Hands-on Digital Publishing course provides hands-on material and helpful contacts in the world of publishing and XML. This course is chaired by Peter Flynn and taught by Nic Gibson, Norm Walsh, Tomos Hillman, and Tony Graham.

For more information, see: http://xmlsummerschool.com/curriculum-2016/xml-in-publishing-2016/

Related Articles

Responses