An earlier post - How the standards work together - showed that the Open Publication Structure (OPS) specifies the XHTML tags, grouped into modules, that should be used to create an epub publication. These modules constitute what is called a Preferred Vocabulary. In other words, the OPS specifies the tags that should be used to define the structure of a work - a <div> here, a <table> there, and so on.
All conforming reading systems must recognise and be able to render documents written using the preferred vocabulary. The term 'baseline' reading system indicates this minimal ability.
Beyond the Preferred Vocabulary
You can achieve a great deal using only the preferred vocabulary. A very high proportion of existing printed matter, including its illustrations, could be transferred to epub format using only the standard OPS modules - with layout and formatting support from CSS. However, that adjective 'preferred' hints that epub productions do not have to be written exclusively in that vocabulary. OPS recognises that there are other tags in XHTML 1.1 that an author might want to use and that there are other XML vocabularies that a publisher might need to include.
XML is a widely used technology. If you enter into Google the search term: 'XML vocabulary for', and follow this with any topic in which you have an interest, there's a good chance someone has designed, or is actively designing, an XML vocabulary for that topic. I found the following; you will find many more:
- ceXML - civil engineering XML.
- genXML - for the exchange of genealogical data.
- mathML - XML for sharing mathematical expressions.
- MusicXML - XML for capturing musical notation.
The Open Publication Structure offers an approach that allows non-preferred content to be included in a publication while ensuring that the content is available, in some form, to all consumers. The trick is to allow reading systems that have been designed to handle the non-standard content to exploit it to the full while insisting that the publisher provide the information in a form that is accessible to baseline readers.
XML Islands
If I were to embed snippets of foreign languages into this post by suggesting per se that it is de riguer to introduce de novo some concept of the deus ex machina, you would rightly accuse me of poor writing style. However, those French and Latin phrases are examples of content taken from another language, or non-preferred vocabulary, emebedded in a stream of preferred vocabulary; in this case it's embedded in English but it could be Greek embedded in Spanish or Mandarin embedded in Tagalog.
It is poor writing style to sprinkle foreign phrases about like this because the reader whose preferred vocabulary is English will not necessarily understand Latin. Likewise, a chunk of XML written using a non-preferred vocabulary will make no sense to a baseline reading system because it is not required to process such content. I'm not saying it's a bad idea to insert 'foreign' XML into a content document, simply that special handling is required when it is used.
Chunks of XML, written in a foreign language and embedded in a stream of preferred vocabulary, are called Inline XML Islands. Islands in the stream, that is what they are.
It is possible for entire content documents to be written in a non-preferred vocabulary. In this case, and because they are not embedded in a stream of preferred vocabulary, these documents are called Out-Of-Line XML Islands, though they are more like continents than islands - entirely inaccessible and incomprehensible to a baseline reading system.
The Open Publication Structure and Open Packaging Format standards define between them what XML Islands are and then state the requirements to be met by publishers when creating them and the guidelines to be followed by reading systems when encountering them in a publication.
Publisher's Responsibilities: Out-Of-Line XML Islands
If a publisher wants to include Out-Of-Line XML Islands in a work, they must meet the following requirements.
- The XML Island must be a complete XML document that conforms to its own schema (the schema defines the vocabulary).
- The manifest item for an Out-Of-Line XML Island must identify the namespace of the document using the required- namespace attribute.
- For each Out-Of-Line XML Island, the publisher must provide a fallback document which can be processed directly. The manifest must include fallback documents as well as the XML Islands they support.
- The manifest item for the XML Island must include a fallback attribute, and that attribute should give the id of the fallback document.
- If necessary, a fallback item may itself have a specified fallback, creating a fallback chain.
- Fallback chains must not form a loop.
- As an alternative to a fallback item, the publisher may provide a stylesheet which can be used for the presentation of the non-standard content. In this case the fallback-style attribute should be specified and the target stylesheet should be identified.
- An Out-Of-Line XML Island may specify both a fallback item and a fallback-style.
A reading system is some combination of hardware and software. In an open market, reading systems will have a range of abilities, including the ability to handle one or more content types that fall outside the OPS preferred vocabulary.
When a reading system processes an item in the manifest which specifies a fallback item it should follow these guidelines:
- Starting from an initial content document, identified in the spine or NCX, the reading system must follow the fallback chain until it finds a document it knows how to display. At the end of every fallback chain the reading system should find a document that it can render.
- A reading system may display any item that it is capable of processing, it doesn't have to be the first one it finds.
- If an Out-Of-Line XML Island specifies both a fallback item and a fallback stylesheet, a reading system may choose which one to use.
- When a reading system is designed to have special capabilities, it may do more than the minimum with the content of an XML Island.
When a fragment of 'foreign' XML is to be embedded in a stream of content which is written using the preferred vocabulary, the publisher should provide an inline mechanism for handling it.
We saw that fallback documents were used for Out-Of-Line XML Islands. The equivalent inline technique is the switch statement which presents zero or more case elements each of which wraps XML markup inside a required-namespace declaration. The syntax takes the form:
<ops:switch id="switch_id">A reading system should examine the required-namespace of each case element and determine whether it can handle that namespace. It should process the first such case that it finds, although it doesn't have to. If the reading system either cannot or chooses not to process any of the cases, it must process the default element. The default must always contain content that would be valid in any OPS content document.
<ops:case required-namespace="namespace">
... XML content in the named vocabulary
</ops:case>
<ops:default>
... fallback OPS-compliant content
</ops:default>
</ops:switch>
The example below shows how a fragment of MusicXML might be presented in a content document.
<ops:switch id="musicXML_Example">A reading system that understands MusicXML would probably choose to process the XML contained within the case element. A baseline reader would process the default case and render the image shown here:
<ops:case required-namespace="http://www.recordare.com/">
<score-partwise version="2.0">
<part-list>
<score-part id="P1">
<part-name>Music</part-name>
</score-part>
</part-list>
<part id="P1">
<measure number="1">
<attributes>
<divisions>1</divisions>
<key>
<fifths>0</fifths>
</key>
<time>
<beats>4</beats>
<beat-type>4</beat-type>
</time>
<clef>
<sign>G</sign>
<line>2</line>
</clef>
</attributes>
<note>
<pitch>
<step>C</step>
<octave>4</octave>
</pitch>
<duration>4</duration>
<type>whole</type>
</note>
</measure>
</part>
</score-partwise>
</ops:case>
<ops:default>
<img src="images/Cnatural.png" </img>
</ops:default>
</ops:switch>