Why PDF to EPUB conversions are so bad?


PDF to EPUB conversion is a hot keyword. Many of you might have searched for it and reached various free and paid converters. If your experience is like most others, you would have been terribly disappointed by the results.

Some common issues?

  • Unexpected or missing line-breaks and paragraphs.
  • Parts or the entire text getting garbled.
  • Headers and footers becoming part of the text.
  • Columns getting misinterpreted.
  • Placement of images, highlight boxes going awry and mixing up with main text.

Why does it happen?

One basic difference is the format itself. PDF is a fixed layout format, while EPUB is reflowable. What this means is that whatever content you see on a PDF page is hardcoded to appear exactly there.

Sometimes this means that PDF does not store enough semantic information. The only information the PDF is interested in storing is how each point on the page is expected to look. When the same sentence wraps onto a new line, the PDF may not know that it is the same sentence.  The same can be said about paragraphs. It also doesn’t distinguish headers and footers from the rest of the text. It only knows that the piece of text that we understand as header comes before the piece of text that we understand as the continuation of chapter. In a two-column layout, the two different parts of the text that are on the same line (in two different columns) may not be understood correctly by PDF. It might treat it as the same sentence (which would be rather meaningless and baffling for a reader!).

The EPUB logo.

The EPUB logo. (Photo credit: Wikipedia)

At other times, PDF might store information that is not translatable to EPUB format, e.g. the fixed location of images. Since the content is reflowable in EPUB, if your book in PDF format has relied on the image appearing right next to an entire paragraph, it won’t translate well in EPUB. As a relatively new standard, EPUB also has limitations in terms of the kind of formatting styles it can support. The presence of multiple e-readers, which may have different levels of support for what the standard prescribes, complicates issues further.

PDFs also come in different varieties. Some of them store more information relevant to EPUB and are simpler in layout. They auto-convert better than ones with complex layouts and lesser semantic information.

What to do?

All this doesn’t mean that technology cannot do a better job of conversion than it currently does but given the myriad varieties of PDFs created from different software and at different points of time in the evolution of the format, it is unlikely that a click and go solution will emerge any time soon. If you only have a PDF left of your book, then be prepared to incur the expense or efforts for a round of proofreading, and potential rethinking of layout and content (“Refer to picture on the next page” doesn’t work in EPUB, which doesn’t have anything like a fixed page).

In case of some books, one might argue in favor of having an imperfect EPUB that is auto-converted from PDF rather than not having one at all. For the most part, if you are planning to make EPUB commercially available in the general market, an EPUB converted from PDF without further working upon may not be the most satisfying experience.



Leave a Reply

%d bloggers like this: