Given the rising tide/trend of electronic books, for a number of months I’ve been pondering how to make our NMC publications available in an ebook format. With the push of an iThing it looked like ePub was the format to aim for. It is after all, a standard (or is it a guideline).
My experience suggests it is a muddy place, much depends on the devices that access the content (oi the browser wars of the 1990s), but this is a stream of what I’ve figured out so far. I will pre-amble that I have almost no expertise in this- its just what I figured out by head-banging attempts to produce an ePub.
I’ll foreshadow the hint that I am excited about the just released Anthologize tool for generating electronic texts but it’s too early to tell on that one.
First, I tried a number of the various tools that offered to convert a PDF to an ePub -e.g. ePubBud. I can say that you get “something” you can view at the end, but its really not optimal on format, layout- you don’t get much in the options to customize, so its a crapshot whether it does a decent job.
That is because under the hood- what an ePub file really is is not a file at all, but a container of files, many of them XML, and all the “content” portions of your ePub are structured HTML, or XHTML. So anything that attempts to “convert” your PDF must make guesses as to what are headers, where are breaks, etc, and who knows what it does with things like lists and links.
I learned the most from the excellent tutorial “How to Create an ePub By Hand” which clearly illustrates many of the moving parts in an ePub, and provides a template to start with. Harrison Ainsworth’s Epub Format Guide is another great reference (and is also available as an ePub).
So what you end up doing is a lot of hand coding of XML and XHTML files, package it up with a few other key files in a zip, and than just change the file extension from “.zip” to “.epub” If you are leaping ahead like I did and think you can take that DRM sprinkled ePub, swap its file extension to .zip, and pry open to peek at the structure– good luck. You get a *.cpgz file which when you uncompress– gives you another version of the original zip, endless circle (well not exactly true, I just found the unix command line “ditto -xk source.zip
While embarking down this manual path, I also asked our publication designer, who does use inDesign to generate our documents, to experiment with its ability to export ePub. I’ve not hear much, but he was unable to even get a simple test file going, and as is, this would require recasting of his templates and styles to get something ePub-able.
I also got connected (thanks Phil Long) with someone who does this as a business – and found out on that end, they doe a ton of work doing it the manual way to get a template that works, and then get to the point of more automation in generating the content.
I thought this would be pretty easy to do for our NMC Horizon Reports since I already re-publish them in WordPress format (see http://wp.nmc.org/) so I already have the content in HTML. It took a little bit of tidying to get clean XHTML- changing extensions to .xhtml, closing some tags properly, adjusting local links, changing the HTML headers to:
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
The tricky part was packaging the files up. The instructions indicate that the special mimetype file should not be compressed, and must be the “first added to the zip”. I had no luck getting this to work on a Mac, and even on a PC using WinZip, with everything as stated, I could not get the file to validate using the ThreePress validator— it kept saying the first file in the zip was not 8 characters long (meaning it was not finding “mimetype” first.
Crap. I was in a corner.
I looked at other apps- Calibre is very handt for converting between eBook formats, and allows some modifications of the various settings (setting a cover image, editing the metadata) but what I really sought was something that was more of a full fledged ePub editor.
And than I found maybe not the Holy Grail, but for me, what turned out to be pretty Grail-ish – eCub by Julian Smart. It is cross platform and free!
eCub is a cross-platform tool for creating EPUB and MobiPocket books. EPUB is become a popular e-book standard and is open and free for all to implement. EPUB files can be read by MobiPocket, Adobe Digital Editions, FBReader, Stanza, the Sony Reader, and many other readers and applications. MobiPocket books can be read on desktop platforms, mobile platforms and the Amazon Kindle e-book reader.
eCub offers a convenient way to import text and XHTML files and create all the necessary components of an EPUB file. It makes it easy to view and edit files, and check the generated EPUB, using external tools. It can also generate audio files from your book content using eSpeak and other text-to-speech software.
A wizard allows you to create a new project in seconds, with options for generating a table of contents, a cover page, and a title page. You can create a simple cover design image using templates and a simple design tool. Then you can compile, check and try out the EPUB at the click of a button.
With eCub, I simply made a new project, and was able to import my directory of xhtml files. The file tool allows me to change the order, and even to edit them if needed:
There are a number of settings panels that pretty much take care of the grunt work of generating the content.opf and other XML files the ePub needs, plus it adds the meta data.
It has templates you can use to generate a cover, but I just went for the simple of the same cover image from our publications.
I went a few rounds of edits (mostly tweaking the XHTML for some formatting errors) with the ThreePress ePub validator to get all but one error cleared there (it seems to be saying the format for my publication date is invalid, but I cannot see any issue there).
So here is 2010-Horizon-Report.epub (236k) a test version of an ePub equivalent of the 2010 NMC Horizon Report (web version) This is a draft version, yadda yadda, small type legal mumbo jumbo, batteries not included, your mileage will vary…
But the test was in seeing how it worked. The desktop version of Stanza was sad, as it seemed to ignore all formatting, and produced a river of text.
I have to say the iBooks app in the iPad looks and acts the best so far. I like how my own publication sits in the shelf
The downside is of course, the jump rope of having to get stuff there via iTunes sync. Also, mysteriosuly enough, with ePub files sent my mail or even when accessed in DropBox, iBooks is never offered as a helper app for opening ePub files.
But in iBooks, this version works in a lovely manner- it displays using the simple styles I made for headers, the hyperlinks to internal and external URLs all work, the table of contents and other bits work great.
Stanza too displayed the content reasonably (though it ignored my own style sheet), but did all of the lists, bold, italics as it should:
However, the hyperlinks in the generated Table of Contents as well as internal ones went nowhere. In some searching (I cannot locate the exact location now, but thought it was here)… I found out to get Stanza hyperlinks to work, you actually have to press and hold the link at least 2 seconds. I think that is “the fix is in progress” statement here.
So I have an ePub proof done, and tested it small scale. I don’t have access to other readers, and assume I might have to run it through Calibre to generate versions that will work with other eReaders (which has my scratching my head over the concept of “standard”). With eCub and my content already in formatted HTML I should eb able to convert a number of our other documents more easily.
In the end, creating an ePub is far from easy. I would think there is a ton of room for someone to create a better kind of application to generate ePub files and more than the quick and slap conversions, but a full fledged editor.
And most ideally, I am hopeful the just off the code press Anthologize will be a viable option, which might be the best since our content already exists in WordPress. It does require a little bit of server sized tweaking (re-compiling PHP to include the ZIP extension).
And that’s all I know for now!