BookMachine has a fresh coat of paint for 2018 and we’d love your feedback! Let us know what you think.

  • Home
  • XML basics … and I do mean basics!

XML basics … and I do mean basics!

This is a very basic introduction to XML (extensible markup language). If you think that XML is exclusively for techie people or you don’t really know what it is at all, this post is for you.

When I was very young, I had a digital watch (I am old enough to remember when digital watches were new and exciting).

One day, I was very bored and very curious. I decided to take it apart to see how it worked.

It was a disappointing experience. One look at the insides and I thought: there’s no way I’m ever going to know what’s going on here.

That’s pretty much what I thought the first time I looked at some computer code. I say ‘computer code’, although that doesn’t really mean much, because I really can’t remember what it was I was looking at. Just that it was computery. And meaningless.

So if you’ve ever opened an XML file, seen what looked like a random jumble of punctuation, interspersed with what may or may not have been English, and thought I give up, I know just how it feels.

To be fair, an XML file can look pretty complicated.

But the reality is that it’s ludicrously, insanely simple.

Actually, I think just knowing this is half the battle. It’s very easy to tell yourself this is technical stuff, so it’s not for me, and I’ll never understand it. I know that because it’s what I used to think.

In fact, you just need to know a little terminology and a couple of simple rules, and you’re away.

1. What does XML do?

The answer is: not much, really. XML is just a way of storing information. The useful bit is that it stores it in such a way as to make it easy for computer programs to use it for all kinds of purposes.

2. Elements

Elements are the building blocks of an XML file. You can think of an element as a box containing a piece of information.

An element (usually) comes in three parts.

i. A start tag. This is how you know that this element starts here! It looks something like this:

<someElement>

ii. Some content. This is the information you want to store.

iii. An end tag. It looks very much like the opening tag, except that it has a forward slash after the <.

</someElement>

So, a complete element could look like this:

<someElement>A piece of information I want to store</someElement>

XML is case sensitive, so you need to make sure that the letters in your end tag match the ones in your start tag.

3. Can I call my element anything I like?

Pretty much. You can call it <someElement> if you really want to, although you’ll probably want to make it a bit more descriptive!

The point is that there is no set list of XML elements.

However, there are some XML files, such as the ones included in an ePub file, that need to contain certain elements with certain names. This is so that the ereader software knows what to do with the contents of those files.

There are also some rules about naming your elements. For example, you should start them with a letter or an underscore, and (without going into detail) punctuation is best avoided. Oh, and don’t start the element name with the letters ‘xml’.

4. Can I put one element inside another?

You certainly can!

In fact, you can nest lots of different elements inside other elements.

This example contains information about a book:

<book>

<title>A Guide to Seamonsters</title>

<author>

<author_pen_name>Captain Ahab von Herman</author_pen_name>

<author_real_name>J Bloggs</author_real_name>

</author>

<first_published>2010</first_published>

</book>

Every XML document needs one (and only one) root element, which contains all the other elements. In the above example, <book> could be the root element. But if we wanted to include more than one book, we’d have to create a new root.

For example:

<book_catalogue>

<book>

<title>A Guide to Seamonsters</title>

<author>

<author_pen_name>Captain Ahab von Herman</author_pen_name>

<author_real_name>J Bloggs</author_real_name>

</author>

<first_published>2010</first_published>

</book>

<book>

<title>North American Trains</title>

<author>

<author_pen_name>Bobby Steam</author_pen_name>

<author_real_name>John Smith</author_real_name>

</author>

<first_published>2012</first_published>

</book>

</book_catalogue>

5. Attributes

An attribute contains a bit more information about an element. For example, say we wanted to include a price in our book example. We could add a price element:

<price>9.99</price>

However, we might also want to include information about the currency the price is quoted in. We could do this by adding an attribute named currency:

<price currency=”GBP”>9.99</price>

Attribute names, like element names, are user defined, although once again there are some rules about what you can and can’t call an attribute.

6. Just one more thing…

We’ve nearly got enough to create an XML document. We just need one more thing, and that’s an XML declaration, which identifies it as an XML document and can include a couple of other bits of information.

It will look something like this:

<?xml version=”1.0” encoding=”UTF-8”?>

This goes right at the beginning of the document and will tell any computer program reading it how it needs to deal with it.

7. Why are we doing all this again..?

This brings me on to the last point. XML is useful because its fixed structure makes it easy for computers to handle. Once you’ve labelled your information in an XML file, you can do all kinds of things with it. For example, you might want to create a searchable database or build an online product catalogue.

This short guide is far from complete, but hopefully it will show you that XML isn’t half as scary as it first seems.

I think that’s important.

Others may disagree, but I think we’re living in a time when it’s useful for all of us to be just a little bit techie. I remember a time when I thought of anyone who could do anything with computers as having semi-mystical powers. They’re not called computer wizards for nothing.

And while not all of us can have dazzling coding skills, I believe that at least sharing a vocabulary with those who do will help us work better together.

 

Alex Painter writes and teaches for Editorial Training (www.edittrain.co.uk), and is a marketing and web development consultant for Monday Communications (www.mondaycommunications.co.uk).

Tags: , , ,

Comments (5)

  • I have very limited technical knowledge, but do work in publishing, so this is very useful! However, I have a question – what is then the difference between HTML and XML? From my days on the website “myspace,” I was able to change my profile by doing things like and and I was able to change my background designs by copying across more complicated bits of code from other websites – but I was led to believe this was called “HTML.” Is it, in fact, all XML?!

    • From a very simplistic view you can think of XML as a way of structuring any data set, in this case books, but why not movies, restaurants, etc? All the syntax is defined in the standard such as elements, attributes, roots and so on.

      You can think of HTML as a defined subset of elements which describe webpages. You see the same principles being used with respect to elements adding structure to a page. But each element has a particular role and browsers understand how to handle these elements.

      If I added the element to a HTML document the browser doesn’t know what it is – it’s outside of the defined standard – so it doesn’t know how to display it even if it is all nicely structured. It does know about ‘h1’ for headers, ‘p’ for paragraphs, ‘img’ for images and so on.

      Having data in XML means it can be easily parsed by computer programs. But the computer won’t know what it all means and typically expects a particular format. That’s where XML schemas come in to play which define a specific set of elements and attributes (and often possible values) into a documented standard. So MathML defines a way of encoding mathematical equations in XML; SVG is a way of encoding vector graphics in XML, ONIX for Books is a way of storing book information in XML.

      So coming back to your original question HTML is XML-like. They very much share a similar heritage.

Comments are closed.

Get the latest news and event info straight to your inbox

Subscribe to the BookMachine Newsletter.

Account


+44 203 040 2298

6 Mitre Passage, Digital Greenwich - 10th Floor, Greenwich Peninsula, SE10 0ER

© 2018 BookMachine We love your books

%d bloggers like this: