5 Jan 2010

NCX navigation in epub books

We saw in an earlier post how the <spine> element in the package document is used to provide a linear reading order for the content documents of an epub publication. An ebook reader could use the spine data to retrieve and chain together the book's content documents and present them to you, starting at the first screen of the first document.

But, suppose you're looking at Victor Hugo's Les Misérables and you want to find the passage where Jean Valjean is carrying the wounded Marius through the sewers of Paris. How to find it? You don't want to scroll forwards through the book from the beginning (my Penguin Classics version has 1200 pages and I'd be looking for page 1083). What you need is a Table of Contents that would guide you to Part Five, Book III, Chapter III. The epub specifications make this possible through the use of an XML document that follows yet another open standard.

The DAISY Consortium and NCX
The Digital Accessible Information System consortium is an organisation aiming "to lead the worldwide transition from analog to Digital Talking Books." As such, they have been heavily involved in developing and maintaining information standards. The DAISY/NISO standard entitled DAISY Specifications for the Digital Talking Book is the basis of the approach taken by the International Digital Publishing Forum (IDPF) to providing a Table of Contents in an epub publication.

The standard has an acronym. It's called NCX. The Open Packaging Format specification offers two meanings for the acronym:
  • Navigation Center eXtended
  • Navigation Control for XML applications
Take your pick, it doesn't really matter. The important thing is that the standard is followed. To give you immediately an idea of how NCX can be used to present a Table of Contents, take a look at Figure 1.

Click to see the full image

Figure 1. NCX for The Curious Case of Benjamin Button

The screenshot in Figure 1. shows the NCX document that's shipped with the epubBooks version of The Curious Case of Benjamin Button. In this view, I've collapsed the <head>, <docTitle>, and <docAuthor> elements in order to concentrate on the <navMap> information.

A simple NCX example
A book can be said to have a hierarchical structure - it may have Parts which comprise Chapters which in turn may be divided into Sections and Sub-sections. This hierarchy is expressed in the navMap section of the NCX document. Starting with our simple example, which has no hierarachy to speak of, take a look at Figure 2, below.
<navMap>
   <navPoint id="navpoint-1" playOrder="1">
      <navLabel>
         <text>Title Page</text>
      </navLabel>
      <content src="title.xml"/>
   </navPoint>
   <navPoint id="navpoint-2" playOrder="2">
      <navLabel>
         <text>epubBooks Information</text>
      </navLabel>
      <content src="epubbooksinfo.xml"/>
   </navPoint>
   <navPoint id="navpoint-3" playOrder="3">
      <navLabel>
         <text>1</text>
      </navLabel>
      <content src="chapter-001.xml"/>
   </navPoint>
Figure 2. A simple navMap

The <navMap> element can have any number of <navPoint> elements. Each <navPoint> identifies a significant subdivision of the book to which the reader may navigate directly - for instance to the start of a given chapter. The <navPoint> element has attributes 'id' and 'playOrder'. 'id' is a unique identifier and 'playOrder' is a number, starting from 1, that indicates the position of the navPoint in the sequence of content documents making up the publication.

A <navPoint> element contains a <navLabel> element and a <content> element. The <navLabel> has a <text> element which holds the text that will be displayed in the Table of Contents. The <content> element has 'src' attribute that tells the reading software the name of the content document to display.

Figure 3. shows The Curious Case of Benjamin Button opened using Adobe Digitial Editions®. You can see how the <text> values from each <navPoint> are displayed in the Table of Contents.

Click to see the full image

Figure 3. NCX Table of Contents in Adobe Digital Editions®

When the reader clicks on chapter 1, represented by the imaginatively selected "1" in the Table of Contents, the reading software looks up the <content> element of that <navPoint> and fetches 'chapter-001.xml' for display.

An NCX with two levels
A <navPoint> may itself contain further <navPoint> elements. That is how the hierarchical structure of the book is expressed. In the simple example above the nesting is only one level deep, but take a look at Figure 4.
<navPoint id="navpoint-3" playOrder="3">
   <navLabel>
      <text>PART ONE</text>
   </navLabel>
   <content src="part-01.xml"/>
   <navPoint id="navpoint-4" playOrder="4">
      <navLabel>
         <text>CHAPTER I</text>
      </navLabel>
      <content src="chapter-001.xml"/>
   </navPoint>
   <navPoint id="navpoint-5" playOrder="5">
      <navLabel>
         <text>CHAPTER II</text>
      </navLabel>
      <content src="chapter-002.xml"/>
   </navPoint>

...intervening navPoints in Part One

</navPoint> (end of navpoint-3)
Figure 4. navPoints nested to two levels

The NCX extract in Figure 4. is taken from Jules Verne's 20,000 Leagues Under the Sea, also downloaded from epubBooks. I've changed the colour coding of the elements to show the hierarchical relationship between the navPoints.

navPoint-3 introduces Part One of the book. As before, it has a navLabel (PART ONE) and the content document is called 'part-01.xml'. Next is another navPoint within the Part One navPoint i.e. before the </navPoint> declaration. This indicates that the new navPoint and all of the navPoints up to the next </navPoint> belong to Part One of the book. Figure 5. shows how this looks in Adobe Digital Editions®

Click to see the full image

Figure 5. Two-level NCX

Notice how the chapter titles are indented within PART ONE. The standard does not impose any limit on the depth of nesting so if you're a corporate lawyer drawing up an eContract you can create sub-sections to your heart's content. Also, NCX doesn't impose names like part, chapter, and section to the structure of a book. It's left entirely to the publisher how they want to name these subdivisions; the NCX document holds whatever you choose in the navLabel/text entries.

Ebooks are nothing without the writing
To finish off this topic, I went and found my all-time favourite book, Les Misérables, and opened it in Adobe Digital Editions to locate that passage I mentioned earlier. Figure 6. Shows how epubBooks built the NCX for this book. They have three levels of nesting, dividing the work into Volumes, Books, and Chapters.

Click to see the full image

Figure 6. Jean Valjean carrying Marius