Listen up! Audiobooks finally get a standard

Photo of a boy yelling into a professional microphone

Audiobooks are cool. They are so cool that they don’t need no rules.

Having worked with ebooks for a long while, where we have had the well documented EPUB3 standard for nearly ten years now, it surprised me to learn that there has been no equivalent standard for audiobooks.

I suspect the reason that this has happened is that, technically speaking, audiobooks are not that complex. Whereas we rely on preflighting and PDF/X for print and epubcheck for ebooks, we have been able to get away so far by putting some MP3s in a folder and calling it an audiobook.

From a production point of view though, the actual audiobook files that are fuelling the booming audiobook industry are a loose cannon. Different publishers and retailers may have their own guidelines but collectively they follow no specific structure. The file format use for audiobooks along with their compression levels, file naming, folder structure, metadata (or lack of) such as author and narrator, book title, chapter titles, table of contents and playing order, cover image size, naming and graphic format are all up grabs.

Just to get on with our work we do need some level of organization  so where there is no official standard people, will make up their own. Taking just naming and submission for an example:

The well known audiobooks retailer Audible from Amazon suggests naming tracks 02_Chapter-01.mp3 to indicate their content and order and to upload an audiobook chapter by chapter along with the cover graphic.

The direct ebook and audiobook sales solution Glassboxx from Firsty asks us to submit a folder of mp3 files named as <ISBN-13>_nnn_MP3.mp3 where nnn is a 3 digit sequence number for the file within the audiobook.

The book discovery and recommendation service NetGalley from Firebrand requires a zip file named solely with the ISBN and containing a collection of mp3 files with the listening order noted at the beginning of the file name, including leading zeros and the cover image must not be in the folder.

Where am I?

The lack of an overall structure or any metadata in these different guidelines means that there is no proper way for us to consistently add covers, metadata and useful structure such as a publisher supplied table of contents and running order. We should be able to do better than trying to cram clues to the chapter title and playing order into our audio file names.

And without a standard to check against, we have no method of validation. You can’t make an invalid audiobook but there is no way to make a valid one either.

This can be a  frustrating inconvenience for everyone but, from an accessibility point of view, this is so important that the RNIB, and similar bodies internationally, have to choose which books to spend time and money reformatting the content from ‘regular’ audiobooks into a more accessible format. Take a look (and listen) to this clip from a great session from EDRLab’s Digital Publishing Summit earlier this month where Danny Faris from NNELS (National Network for Equitable Library Service) in Canada showed the confusion as tracks have no table of contents and play in the wrong order.

W3C Audiobooks

The W3C (standing for the World Wide Web Consortium) is the international technical body that decides how the web works. The W3C Publishing Working Group which discusses and decides how web technologies work may work for EPUBs have recently been discussing audiobooks.

I say recently… Standards bodies move slowly. These things take time. Many proposals, discussions, meetings and debates take place before a standard becomes recommended and proposed. If you work in publishing you may know the feeling…

But, we are nearly there! The W3C audiobooks specification that fully describes a suitable standard for audiobook structure, metadata and packaging moved to become a ‘Candidate Recommendation’ in Spring 2020 and it is widely expected that this will become an official mainstream format later this year.

It lays out exactly the way for playing order and table of contents, metadata such as title, author, narrator, cover imagery and more in a open and well documented way using standard web code.

There are some real benefits in standardization. Interoperability is the geeky word used to describe how products or systems that uses a predictable and open standard can work with each other and without restriction.

Modern ebook reading framework Colibrio Reader was quick to add support for W3C Audiobooks. This means I can already share a URL for you to open an audiobook as easy as this.

Windows and Mac ebook reader Thorium Reader already supports playing W3C Audiobooks on the desktop.

Book sampling service Jellybooks already uses the new W3C Audiobooks standard for submissions (or will convert what they receive from publishers into this standard format).

Ebook and audiobook creation tool CircularFLO already allows InDesign users to make audiobooks to the W3C Audiobooks standard, free of charge.

Synchronised Narration

Along with the new, predictable and interoperable standard the W3C is also looking to the future of audiobooks.

It is by no means a requirement but the W3C Audiobooks standard also allows for extra content to be declared and added inside the audiobook package. Perhaps optional extra imagery or animations could be displayed at certain times to enhance different scenes or chapters, maybe alternative voices added as different audio tracks could allow the listener to choose who read the book to them.

With fixed layout EPUB and also reflowable EPUB3, but to a lesser extent depending on support from reading systems, we have always been able to embed recorded audio files and match that to the text in a book as read aloud ‘media overlays’.

In a similar way the W3C Audiobooks standard allows us to insert all the words of a book inside the audiobook package along with the timing  of their position in the audio. This means it becomes possible to blur the boundaries between an ebook and audiobook.

Wendy Reid’s and Marisa DeMegloi’s session for Tech Forum is a great introduction to the W3C Audiobooks format and this part in particular shows a working proof of concept of how words can be displayed in sync with the audiobook, for smaller screens or for captioning.

This demo is functional enough to be exciting to techies like me but I expect Colibrio to come along and put a slick user interface on it soon, like Apple have done for music lyrics, and then instead of seeing text highlighted on an ebook page, we can display the current passage of text in our audiobooks in the way we choose.

I can easily see how this could have great accessibility, education and language learning possibilities as well as just being rather cool too, just like those cool audiobooks.

Ken Jones runs Circular Software. He was Technical Production Manager and Publishing Software Trainer for Penguin and Dorling Kindersley for many years and now offers software, training and advice to publishers such as Quarto Group, Bonnier Books and Pan Macmillan to help them get the best from their print and digital workflows.