EPUB is the open standard that defines exactly what an ebook can be, and EPUB 3.2 – the first real update to the EPUB format since EPUB 3 in 2012 – has just been approved by the W3C Community Group and Business Group (the clever people that decide on such things).
If you are the sort of person who enjoys reading specification documentation then you can get up to speed with the full list of changes here. Otherwise, let me explain the main points of what has changed…
This is a guest post by Anna Cunnane. Anna is Systems and Data Manager at Abrams & Chronicle Books. Anna was winner of the Trailblazer Awards 2018, she is part of BookMachine Team Unplugged and was Chair of the Society of Young Publishers (2015-16).
Abbie Headon runs Abbie Headon Publishing Services, and offers a range of skills including writing, editing and commissioning, alongside social media, website development and publishing management. She champions fresh approaches to solving the industry’s challenges and can be found mingling at most publishing events. Abbie’s also BookMachine’s Commissioning Editor and sits on the BookMachine Editorial Board.
Francesca Zunino Harper is a linguist, translator, and publishing professional. She worked in the British and international academia researching on comparative literatures, translation, and women’s and environmental humanities for several years. She now works in the Humanities and Social Sciences area of publishing. You can follow her @ZuninoFrancesca.
Sara O’Connor worked in children’s publishing for 13 years and is now a full-time web developer. Frustrated with the cost and creative limitations of outsourcing good digital ideas (check out this article), she decided to retrain as a programmer. She now works with Emma Barnes at Consonance (the new name for Bibliocloud), helping to build the software she wished she had when she worked in publishing. Sara will be joining our panel at BookMachine Unplugged 2018: Talking Tech.
Janneke Niessen is a serial entrepreneur, angel investor, board member and mentor for startups. She has started and sold 2 international tech companies and is currently working on her third: Berlage. She is co-initiator of InspiringFifty, an initiative that aims to increase diversity in tech by making female role models more visible. As part of the InspiringFifty initiative, Janneke has published Project Prep, a novel for young girls in conjunction with an award winning child book’s author. Jannekke will be joining our panel at BookMachine Unplugged 2018: Talking Tech.
Nick Barreto works where books and technology intersect. He’s managed and built apps, is as an expert on ebook formats, metadata and workflows. He is committed to automating all the repetitive tasks to free up more time for the work that matters. Nick is one of Canelo’s co-founders and the Technology Director. @nickbarreto
This article is by Ken Jones of Circular Software. Ken is running the Understanding eBooks day on 25th April 2018.
I’ve been involved in making beautiful and interactive fixed layout ebooks since before there was a standard for such things. But trust me, this one is different… It is truly the finest example of interactive children’s story telling I have ever seen, it contains custom movies on every spread, background audio, professional narration and read aloud text highlighting, placed web code, personalisation, interactive animations and puzzles!
We’re fortunate to have so many book discovery tools and techniques available to us, but leveraging them effectively can be challenging. In this post I’ll share some insights on working strategies, drawn from experience building search and recommendation engines, and from helping publishers connect with readers through keywords.
John Chelsom started the XML Summer School in 2000, and continues as a board member and lecturer at this annual event. Since 2010 he has been the lead architect of the open source cityEHR product – an XML-based electronic health records system which combines clinical data with medical knowledge bases and is currently used in a number of hospitals in England.
2016 saw the unearthing of the oldest written document yet found in the British Isles – on a wooden Roman tablet from about 50 AD. We put huge effort into creating the digital text published today, but how much of it will still be around to read 2000 years from now?
At the XML Summer School over ten years ago, someone asked our expert panel to give guidance on the best way to archive digital publications. The most creative answer, from Robin Cover (a renowned digital archivist!) was that we should carve our text on tablets of stone, marked up in XML. I can still remember laughing heartily at his suggestion, but now I’m beginning to think he was right.
Robin’s argument was that text alone is not good enough for representing the information we want to preserve – we also need some representation of structure and metadata. For that he proposed XML – its encoding is just plain text and given a sufficiently large sample, its logic can be decoded without having the original specifications. His proof was to go back to digital assets of the 1960’s – how many of us would have software available that could read documents created way back then? Well if those documents were marked up in GML, the Generalized Markup Language invented by Charles Goldfarb at IBM in 1969, we would find it could still be read by any software that handles XML, the Extensible Markup Language descended from GML. Such software is all around us and much of its is free, including any web browser or plain text editor. Try doing the same with a proprietary word processing, desktop publishing or typesetting format, where the original application ceased to exist even ten years ago.
As for the tablets of stone, if we found a 50-year-old GML file what are the chances we’d be able to read the media it was stored on? Preservation of our digital assets is dependent on the technology used to store it, and even in fifty years we have seen many technologies come and go; paper tape streamers, tape drives and floppy disk drives will no doubt be followed into the dustbin of technology by DVDs and USB sticks over the next fifty years. So in thousands of years time, when the electricity has been switched off and archaeologists are picking over the debris of our silicon age, its still more likely they will find the text written on stone, rather than tapes, disks or chips. You may think this sounds a little crazy, but the Memory of Mankind project in Austria is aiming to do exactly that – preserving contemporary human knowledge on stone, buried deep in a salt mine for future generations to find.
I sometimes tell people that publishing should be viewed as an investment in digital assets – the more value we can create in those assets and the more we can reuse them, the greater will be the return on our investment. Our most cherished digital assets deserve to be preserved for the future, if only to protect our investment in producing them. And though many of us won’t be ready just yet to carve our documents in stone, we should at least be thinking about the first step of representing those documents in XML.
Take a look at their website for more information on the XML Summer School and their events.
Emma Barnes taught herself to code after founding her own independent publisher, Snowbooks. She went on to build Bibliocloud, the next-generation publishing system. Now she’s on a mission to promote tech skills within the publishing industry and beyond. Emma is also on the newly-formed BookMachine Editorial Board.
6.50am Wake up, wonder what day it is and remember – great! It’s the one day this week that I can dedicate to programming. I’m the MD of the indie publisher Snowbooks, and I’m CEO of Bibliocloud, responsible for sales, finance, and customer success, so each day is very different. But I reserve at least one day a week for slipping the needle in and luxuriating in single-minded programming. It so happens that it’s a Saturday, but that’s when the emails stop… context switching is my biggest foe.
8am First coffee, and a read through the opening chapters of the new Sandi Metz book about object-oriented programming in Ruby. It’s great when you find a book that directly addresses the real-world problems you’re facing. I click through to a podcast that she’s on to hear more.
11am Tests. Yesterday I discussed a piece of code that needs some attention with my colleague, Andy. The code is a method which returns a collection of external URLs that gets displayed in Bibliocloud. The URLs take you to a book’s Amazon.co.uk page, or Amazon.com page, or Wordery page, or British Library page, and so on — a handy and quick way to check what data is out there in the wild. The method doesn’t have automatic test coverage yet, so I’m going to start by documenting current behaviour. I do this using an integration test which mirrors what a user would do. We use Cucumber which gives us a common language between non-technical team members and programmers. I start by creating a new branch of the code based on our master branch, and create a Cucumber feature which literally reads “When I visit the ‘Autodrome’ page in Bibliocloud, and I click on the Amazon.com link, then I should be taken to the ‘Autodrome’ page on Amazon.com”. I then write some code to translate that into automatic test steps.
1pm The grand refactor. The Sandi Metz book has given me a couple more clues as to how this method could be improved, and I’m trying to hold all the concepts in my head so I can look at the problem squarely. Sandi Metz talks about finding the right level of abstraction, so I’m trying to think about which objects this problem is actually concerned with. Is it the validity of the ISBN that is key? Or the destinations themselves? Or the structure of the URLs? Some are built using the ISBN10, others with the ISBN13. Will there be a future case where the URL is built using an ISSN, or a DOI, or an ASIN, or an ISTC, or an ISNI, or an ORCiD iD? If a book belongs to a series, can we say that the book has an ISSN? If its authors have ORCiD IDs, can we use those to create external links for the book? What about linking to the client’s own website?
Or is this a case of YAGNI (‘you ain’t gonna need it’)? All this matters because I want to put the code in the right place, named properly, so that we can find, and change it easily, later. Maintainability, in a large, active system such as Bibliocloud, is probably the most important thing. I start by working with David to sketch out the problem (see the picture), then create a new Rubyclass by adding a text file to my local code repository called external_links.rb.
Like the common language provided by Cucumber, the challenge so far has been approached not with code, but with language, reading, grammar, discussion, and story. I reflect — not for the first time — on how relevant publishers’ skills are for programming.
2pm Lunch and back to the other Sandi Metz book I’m reading: Practical Object-Oriented Design in Ruby. There’s a good bit on page 93 where she talks about duck typing, which I wonder might be relevant. The idea about duck typing is that “if it looks like a duck and quacks like a duck, it’s a duck”. So my ExternalLinks class doesn’t need to actually be handed an actual book object in order to build the URL. It only expects to be able to get an answer when it asks “what’s your ISBN?” (even if it’s “nope, I don’t have one”). I could similarly give ExternalLinks a display spinner, or a CD, or a cassette audiobook: just so long as it can say what its ISBN is. I’m going to use this idea to write ExternalLinks so that it’s not tightly coupled to the Book class itself – though I’m a bit worried that this is another case of YAGNI. I commit this code to my local branch, glad that I’ve named it “spike/external_url_refactor” so that I can discuss this approach with my colleagues before considering it for a merge into our production system.
3pm Iteration. I run the test that was passing earlier and it fails. Huh. I abandon the integration test and start unit testing at a deeper level of the code. I realise that there’s a requirement I hadn’t understood: some of the destinations are dependent on format, as well as ISBN type. Writing the tests illuminate some of the nuances of the domain and I jump between revising the tests and revising the code (avoiding doing both at the same time which is a recipe for misery).
4pm Leave to pick up my son, as I do every day of the week. Programming allows for flexible hours. It’s the sort of job that benefits from a bit of percolation, and fitting it around family makes me happy that I can experience life and motherhood as it happens, rather than only working hard for some imaginary future.
8pm Share today’s programming. Bedtime is done, and I look at the code again, but I think I’ve got as far as my brain will take me today, so I push the code to a branch on Bitbucket, our remote code repository, and raise a pull request with my colleagues. I’ll look forward to discussing this approach with them on Monday and seeing if they notice any glaring or subtle errors, and suggest better ways to structure the code. [Postscript from the future: on Monday, we found no errors as such, but we improved the test suite and I got a lot of clarity about separation of concerns from my code review with Andy.]
10pm Bit more of that Sandi Metz book. It really is very moreish.
For a long time now, I have wanted to code. Like, seriously code. Yet I’ve been continually procrastinating or chickening out or never “finding the time”, as though time were that bit of loose change you find in the pocket of your winter coat when you dust it off again in mid-October. I am drawn to languages you never have to actually speak, and the structural logic minus the performance anxiety of actually speaking it are reasons why I studied Latin into University. Zero performance anxiety.
I have been wanting to attend classes, learn the whims of the different languages, manipulate data and write my own code and programmes and do all sorts of clever things that would make me a more flexible and diverse publisher, not to mention a better human. I looked around, admittedly exhaustively, and spoke to a few friends and friends of friends, and a quick bash of key terms into a search found a few groups on MeetUp, WomenWhoCode for example. I cannot big this group up enough – they host events in association and in the offices of Twitter, ASOS, Sky etc. and have built a community of such strength that their events book up ridiculously quickly and there are sizeable waiting lists of hopeful would-be attendees hoping that some early bird drops out last minute. But many of these women work in tech industries or use programmer languages and deal with data and other such analytics-y things (can you tell I don’t yet?) and glancing over the summaries for each event, as it pings into my inbox (you can opt in for email updates, don’t cha know), there is often a scary amount of jargon for my layman brain to handle.
I am probably hooked up and plugged in to the internet for more hours in a day than I would care to mention, lest any prospective employers are reading this, and through my work with independent publisher, DodoInk, I have had the privilege of working with the wonderfully savvy people at PigeonHole. I read the posts from FutureBook as they land in my inbox and my mind expands and broadens as I take in all the innovative and creative ways people are redefining what it means to be a publisher and how we share, access, and experience books. But like a puckered old balloon, after the excitable expansion, an inevitable deflation ultimately sets in.
If any of you reading this are getting sick of my overuse of the past pluperfect: NO LONGER. Emma Barnes has supplied me with the training wheels to make coding far less intimidating. In her intensive workshop we spent the afternoon getting to grips with four basic (ish) programming languages: HTML, CSS, Java and finally, and youngest of all, Ruby. The way Barnes evangelises about Ruby I think this may be her favourite of the bunch. Just a hunch.
I learnt the jist of when and where to use the different languages, and what they were responsible for in terms in the make-up of a finished, functioning, and hopefully stylish, webpage.
HTML, or Hyper Text Markup Language if we’re being formal, is basically your bread and butter. From my limited understanding it seems to be the main structural element of the four, expressing the information you want on your page — and what’s a webpage with no information? Pretty useless, that’s what.
Probably among the most fun and easiest to play around with was CSS which essentially controls the look of your page allowing you to customise and make it feel even more like your own. Tinkering with this you can alter the font, font colour and size, background colours, etc. Once you find a Hex code database your palette becomes pretty endless.
Java script seems like the most complex of the bunch, and also the most dynamic, enabling your static pages to come to life with smoothly collapsing dropdown information and far more interactive movement elements. Java script also executes within the page as opposed to sending out a request.
Finally: Ruby. To quote Barnes, Ruby is “like poetry”, beautiful and elegant and spare and an extremely coder-friendly language. It also allows you to dynamically find the information you want by utilising a wealth of open-source data resources such as APIs (or Application Programming Interfaces).
As we progressed we applied each stage of our new-found knowledge to build a functioning website, that would be able to mine GoogleBooks’ API data to create our own user-friendly search engine (not too dissimilar from Barnes’ own Bibliocloud). The project based ‘learning by doing’ suited me down to the ground and I didn’t quite realise how quickly I was accruing my new skills — though, don’t get me wrong, I still have a ways to go!
Barnes was backed by an immensely helpful team who were on hand to answer any question, no matter how silly, and help us spot the seemingly indiscoverable errors in our script within seconds of glancing at our screen. They are an impressive bunch, and hugely supportive. In fact, the whole day lacked the often parodied frustration of rage-bashed keyboards and technological tantrums, and was buoyed along by Barnes’ clear enthusiasm and passion for her work, finding joy in the possibilities that code offers, and giving us a glimpse of her curiosity-driven mind.
Barnes eyes would flit over passages of code that were, to me, largely unintelligible and exclaim: ‘oh now THAT’S cool’, or ‘Hmm, THAT’S interesting! I wonder what else I could make that do’. At the end of the day, thanks to Emma Barnes and her team, BookMachine, and FaberAcademy, I think we all left wondering a similar thing: That was cool. Now, I wonder what else I can do…
If you would like to take part in the next Coding for publishers course, sign up to the BookMachine mailing list and we will let you know when tickets are available.
This is a guest post from Emily Gibson and Nic Gibson. They are both directors ofCorbas Consulting Ltdand each have over 20 years’ publishing experience, mostly in editorial, print and digital production.
The other day we were contacted by a client who was really excited about the new digital publishing process they were putting into place, and they wanted some help getting things right. They had bought a database and needed to get their Word documents into the XML language that their database needed. However, the ‘flavour’ of XML that they had chosen wasn’t going to support the content that they were producing. That means that they aren’t going to get the best results and full value from their workflow system.
You see, publishing with XML is not just a matter of deciding to ‘have an XML workflow’. (For a basic description for editors, see, for example, this one in The Chicago Manual of Style.) There are many different ‘flavours’ of XML and you need to pick the one that fits your needs. These needs are defined by the type of content you are publishing and your workflow.
A well styled Word document, for example, can be transformed into a decent XML file. Once you have an XML file, you could simply apply scripts to it to create your output (PDF, HTML for your website, EPUB) – if you have a Word document for your novel, for example, that had Word Styles consistently applied, you can simply run a program to get whatever output you need.
On the other hand, if you have a bunch of journal articles, you could save the files into an XML-aware database and apply those scripts to all your content at once to produce a collection. Whichever system you choose is partly driven by the degree of automation that fits your publishing needs. If you are publishing fifteen monographs a year, there’s not a lot of benefit to an all-singing and all-dancing XML database. If you are publishing several hundred articles a year then there are some big benefits.
The first step is to decide how you are going to go from manuscript to XML and then you need to decide what systems you are going to use to manipulate and transform it.
You need to think about both the structure and the content of your manuscripts when you decide on which flavour of XML you are going to choose. The different flavours of XML are very different in their structure and their expressiveness. The only thing you can be fairly sure of is that there is already one which will meet most of your needs (you don’t need to write it from scratch).
There are different tag sets (the set of elements in an XML language, a.k.a. what’s inside the pointy brackets) that suit different kinds of content, and there are different tools to suit them, too. In the same way that ice cream and sausages are both delicious, but you wouldn’t want them together, not every flavour of XML goes with every kind of content.
Match the content you publish with the appropriate XML language. For example:
If your content doesn’t have specialised semantics (e.g. legal, programming), the XML variant of HTML5, XHTML (as used in EPUB) is perfectly suitable for a lot of narrative (e.g. novels) and monograph material for EPUB, print and web outputs. The XML variant of HTML5 has the advantage of a smaller, simpler tag set, which makes it easier to work with for simpler content.
XML can help publishers tackle managerial as well as technical challenges. It provides ways to manage the workflow, the interaction between content and people, and the publishing processes, as well as the documents themselves. The features of XML ensure that information and its structure can be controlled and managed.
It can be a complex topic, but many publishing professionals find that knowing about XML – even if they don’t use it every day – is immensely useful. There are a number of places that you can learn about XML, but the XML Summer School, held each year in September, is the best and most comprehensive. It presents a range of XML techniques and applications in workflow, change management, QA, linked data, and document structure control to help publishers manage their content effectively.
You work in publishing, right? You think you’re creative? What do you spend most of your time at work doing? If you’re in a low-level position, Snowbooks and Bibliocloud founder Emma Barnes suggests your answer is likely to comprise a list of decidedly uncreative tasks.
In a witty, insightful and inspiring talk at the Galley Club this week, Barnes bemoaned the publishing industry habit of hiring bright, creative people and then getting them to do dull and repetitive administrative jobs. As she wrote in the Bookseller last year, if you joined publishing because you wanted to create wonderful books, you’re left wondering “why copying and pasting between Excel, Word and InDesign feature so heavily in this ostensibly creative process”. But Barnes isn’t just a bemoaner, she’s a problem solver. Her solution to this problem? Get coding. Because if you can code, you can automate some of these dull jobs and free up time to do the fun – and important – creative things publishers should be doing.
For many publishers, the word “coding” is almost as terrifying as “maths”. Certainly, when I used to introduce a 2-hour MA Publishing session on coding at Kingston University, the response was usually a mixture of fear, disinterest and irritation that I was forcing students to spend time doing such a dull and difficult thing. If these are your natural responses, try replacing “coding” in your head with words like “carpentry” or “poetry”. Writing code, says Barnes, is “the modern equivalent of being a carpenter”. It’s an opportunity not just to come up with ideas, or to shape them (as editors do with authors’ work), but to make ideas happen yourself.
Coding can definitely be challenging (Barnes uses the phrase “mind-bendingly awful”), but you don’t need to be a maths geek to be good at it. Barnes has publicly admitted to being “useless” at arithmetic. Instead, she brings a love of patterns, words, symmetry and brevity to the activity. “Writing code trips many of the pleasure centres in my brain,” she says. “And I love feeling that I’m doing something meaningful, creating something out of nothing. Plus, the inherent difficulty of writing good code is a source of huge pride”. My former students, when they made things appear on screen in a browser in a matter of minutes, often shared this pride – evidence that if you can get over the fear or disinterest factor, this undeniably creative pursuit can be extremely satisfying. What’s more, publishers’ natural tendency to grammatical “correctness” makes them excellent coders.
Our love of pedantry feels like a perfect match for a pursuit based on strict rules and zero tolerance for inaccuracy. And our love of words reinforces the idea that publishers are natural coders. As Barnes points out, code is written, just like poetry and books are. Code is even created in narrative arcs, which Barnes likens to “reading the best novel ever” if done well. If you appreciate “the careful crafting of a narrative flow, or the finely-edited end-result of a piece of prose, honed and whittled and buffed to perfection,” says Barnes on Digital Book World, then you’ll appreciate good coding too.
For me, Barnes’s almost evangelical talk of creating things and sending them out into the world was enough to get me thinking I should sharpen up my own coding skills. But the value of coding isn’t just this personal satisfaction. If you can code, you can start to automate those decidedly uncreative tasks that fill up your day. And then you can find time to get down to the collaborative, creative business of publishing. What more encouragement do you need? If that’s truly not enough, then the doomsday scenario suggests school-leavers in a decade or so will all be able to code and your old-fashioned spreadsheet skills will look positively archaic on your CV.
So, how to get coding? Here are some of the ways Barnes’ suggests you “start to dabble with code”:
Spend just 15 minutes on Try Ruby – a quick and basic online programming tutorial.
Try Scratch – a free coding app, designed for kids, that helps you think systematically and build simple games, animations and interactive stories.
Read Michael Hartl’s Rails Tutorial – Barnes claims that if you work through it for half an hour every day for three months, you’ll have built Twitter.
Sign up to a Codeacademy or Code School course – free online courses on programming languages. Barnes recommends learning Ruby.
Join a coding community like Rails Girls or Code Bar – offering help, advice and events (including free workshops). Barnes and the Bibliocloud team volunteer at Rails Girls.
As a result of attending Barnes’s Galley Club talk, I’ve applied to participate in the free Rails Girls London workshop in June. If I get accepted, I’m expecting a day packed with creativity, challenge, making and – I hope – a sense of pride and achievement. Oh, and a newly enhanced CV too. See you there?
Anna Faherty is a writer, publisher and teacher. She collaborates with global publishers and museums on digital, print, exhibition and training projects and has taught on publishing programmes at Kingston University, Oxford Brookes University and University College London. Anna blogs at http://strategiccontent.co.uk/blog and tweets as @mafunyane.