Saturday, August 13, 2011

The Kindle Needs a Flat NCX File

‹prev | My Chain | next›

I think that I am close to wrapping up epub to mobi conversion in git-scribe. Last night, I was able to add a table of contents to the mobi that the Kindle could read. But the little markers in the progress bar are not showing up.

The Kindle pulls the markers from an NCX (Navigation Control for XML) file. There is already an NCX file in the epub source, but the Kindle is incapable of reading it. The Kindle has the sad limitation of not being able to read NCX files in which <navPoint> elements are nested inside each other. Unfortunately, it makes complete sense to nest book elements (chapters go inside the book, sections go inside chapters):
...
<ncx:navMap>
<ncx:navPoint id="id427044" playOrder="1">
<ncx:navLabel>
<ncx:text>The SPDY Book</ncx:text>
</ncx:navLabel>
<ncx:content src="index.html"/>
<ncx:navPoint id="id426427" playOrder="2">
<ncx:navLabel>
<ncx:text>Copyright</ncx:text>
</ncx:navLabel>
<ncx:content src="pr01.html"/>
</ncx:navPoint>
<ncx:navPoint id="id457046" playOrder="3">
...
But, no matter. I can unravel the nesting without any ill-effect on the Kindle presentation because the NCX is only used for markers on the Kindle's progress meter. So first up, I add an NCX flatten method to the decorate_epub_for_mobi method:
def decorate_epub_for_mobi
add_epub_etype
add_epub_toc
flatten_ncx
zip_epub_for_mobi
end
In the flatten_ncx method, I first need to slurp in the existing navigation points:
def flatten_ncx
nav_points = ncx_nav_points

# ... 
end
The ncx_nav_points method, in turn, scans the NCX file for <ncx:navPoint> entries:
def ncx_nav_points
nav_points = []

Dir.chdir('book.epub.d/OEBPS') do
nav_points = File.read('toc.ncx').
scan(%r{<ncx:navPoint.+?<ncx:content src=.+?/>}m)
end

nav_points.
flatten.
map { |x| x + "\n</ncx:navPoint>" }
end
The scan regular expression exploits the fact that all entries start with <ncx:navPoint> and run through to a <ncx:content src=""/> tag. This is true for elements that include children:
<ncx:navPoint id="id427044" playOrder="1">
<ncx:navLabel>
<ncx:text>The SPDY Book</ncx:text>
</ncx:navLabel>
<ncx:content src="index.html"/>
<ncx:navPoint id="id426427" playOrder="2">
<ncx:navLabel>
<ncx:text>Copyright</ncx:text>
</ncx:navLabel>
...
This is also true for elements with no children:
<ncx:navPoint id="id426427" playOrder="2">
<ncx:navLabel>
<ncx:text>Copyright</ncx:text>
</ncx:navLabel>
<ncx:content src="pr01.html"/>
</ncx:navPoint>
With the nav-points extracted from the existing NCX file, I can then redo the <navMap>, but with a flat XML structure:
def flatten_ncx
nav_points = ncx_nav_points.map { |x| x.gsub(/^\s+/, '') }

Dir.chdir('book.epub.d/OEBPS') do
ncx = File.read('toc.ncx')

File.open("toc.ncx", 'w') do |f|
          f.write ncx.sub(
/<ncx:navMap>.+<\/ncx:navMap>/m,
"<ncx:navMap>\n#{nav_points.join("\n")}\n</ncx:navMap>"
)
end
end
end
With that, I get my nice, flat <navMap>:
<ncx:navMap>
<ncx:navPoint id="id427044" playOrder="1">
<ncx:navLabel>
<ncx:text>The SPDY Book</ncx:text>
</ncx:navLabel>
<ncx:content src="index.html"/>
</ncx:navPoint>
<ncx:navPoint id="id426427" playOrder="2">
<ncx:navLabel>
<ncx:text>Copyright</ncx:text>
</ncx:navLabel>
<ncx:content src="pr01.html"/>
...
More importantly, I have my navigation markers back in place on the Kindle.

That is a good stopping point for tonight. Tomorrow, I hope to try a little end-to-end testing and maybe add some testing for all of this.


Day #111

No comments:

Post a Comment