Reading aloud: merging audio and text just got a lot easier

You may know that the modern EPUB3 standard has an inbuilt ability to hold audio and video, but one of the most intriguing aspects of EPUB3 that you may have overlooked is ‘Read-aloud’. This technique, sometimes called ‘media overlays’, combines a spoken audio track with accurate timing information usually used to highlight words on the page in time with the spoken audio.

It is important to note that this is NOT the same as synthetic text-to-speech. Read-aloud uses a pre-recorded audio track and allows for a more ‘immersive’ listening experience. TTS is important for accessibility but it just does not interpret the text like a good human reader can.

Read-aloud ebooks are currently most popular with children’s illustrated fiction and for language learning education but they may have other uses too. They are claimed to be more cognitive and assistive. Scientific studies showed learners exposed to visual and aural stimulation retain more info than just reading or listening alone. Also, reading whilst also listening has been proven useful for mitigating reading disabilities such as dyslexia.

Combining read-aloud features with the beautifully illustrated, well designed and edited pages of fixed layout EPUB3 standard makes for an excellent reading experience. Adding background soundtracks, animations and other interactive elements is all possible in fixed layout EPUB3 if desired. The dazzling attraction of any shiny tablet device seems to be like catnip for most children (and some parents alike) and I would much prefer my daughters spend their screen time within a professionally published book than most of the dubious apps that are aimed at them.

Actually, read-aloud can also be used to play audio at a correct time without highlighting words and we have used it to sync text and visuals with music. Also with our ebook creation software CircularFLO we can also highlight hand-drawn and scanned artworks. Award winning children’s publishers Nosy Crow did just that for a series of highly illustrated children’s books and said “Working with CircularFLO we’ve managed to produced beautiful, unique editions.”

Apple have supported read-aloud for a long time in their free iBooks reader app which is now available for the Mac OS and comes pre-installed on every new iOS device and the free Adobe Digital Editions (ADE) is catching up fast too. Adobe’s desktop reader app had been woeful in the past but is now showing much more promise. The latest version of their reader will and now support EPUB3 and an impressive amount of the EPUB3 features including read-aloud. ADE came to the iPad late last year and ADE was released to the growing Android tablet community just last week. Kobo among others also have read-aloud support in reading apps and there are more readers coming through based on the  excellent opensource web based EPUB3 reader Readium.

How is a read-aloud EPUB made?

To get started you need and audio file. Then it’s actually fairly simple concept: a list of text fragments and a corresponding audio narration are defined. Text can be chunked into words, sentences, paragraphs etc and four pieces of info are required for each fragment. 1. an id. 2. an audio file path 3. the start time. 4. the end time.

All this info is placed inside a ‘SMIL’ file, pronounced ‘smile’ (Synchronized Multimedia Integration Language, seeing as you asked) and then each id needs to be placed around the corresponding words in the EPUB along with some styling and structural information. And this has been the major fly in the ointment. Gathering and applying accurate timing information along with the other hand coding needed is laborious to put it lightly or ‘a nearly Sisyphean task’ as ebook guru Laura Brady says. I looked that up. It means *really* hard. All this effort has understandably put some publishers off.

Completely automated read-aloud EPUBs

By using CircularFLO, InDesign users can export straight to fixed layout read aloud EPUB with no coding required. Since v5 CircularFLO has given InDesign users tools to tap the pace of audio just by tapping the keyboard. But for really impressive and reliable results we have found an automated timestamping service from Sinkronigo to be the best. In CircularFLO 7, we have built tools to use these auto timings directly from within InDesign. It works a bit like speech recognition but by knowing the words already the effort is put into providing accurate word timings.

And it really does work as this YouTube video shows, where, in under 4 minutes, a page is made, audio is recorded, read-aloud text highlighting added and an EPUB created.

Click here for a free trial of CircularFLO.

Ken JonesThis is a guest post by Ken Jones. Ken worked for Pearson and Penguin Group UK for over 10 years before he founded Circular Software and now specialises in publishing software development, training, demonstrations and consultancy for a range of UK publishing customers. This post was originally published by What’s New in Publishing.