Let’s open up the hood and see what’s inside an ePub.
The first thing to know is that an ePub file is actually a compressed collection of files, just like a .zip file. In fact, if you make a copy of an ePub file and change the ePub’s file extension from .epub to .zip, you would have the following .zip file that can be unzipped to extract the contents so we can view them:
We can now unzip the .zip file and view its contents. After unzipping, we see that an .epub file consists of the following two folders ( the OEBPS folder and the META-INF folder) and one file (the mimetype file):
If we open the META-INF folder, we can see that it has one file (the container.xml file) as follows:
The container.xml file provides the location of the content.opf file as shown in the following image. The content.opf file contains important information such as the epub’s metadata (author name, published date, etc.), manifest (a list of every item in the epub file), and the spine (the order in which items are viewed as the reader scrolls through the epub). The content.opf file will be discussed shortly.
There will be additional lines of code in the container.xml file if encryption or digital rights management has been added to the ePub file. The container.xml file has been opened up below in the text editor Notepad++, which works well on a PC. You might use a text editor such as Text Wrangler if using a Mac.
Below is the mimetype file opened in Notepad++. The sole purpose of the mimetype file is to indicate that this is an ePub file.
Clicking on the OEBPS (Open eBook Publication Structure) folder reveal the following three folders (the Images folder, the Styles folder, and the Text folder) and two files ( the content.opf file and the toc.ncx file):
Opening the content.opf file in Notepad++ reveal three main parts of this file. The first part of the content.opf file shown below contains all of the metadata (author name, publication date, etc.) for the ePub file. The second part of the content.opf is the manifest for the entire ePub file. Every item in the entire ePub file is listed in the manifest.
The third part of the content.opf file is the spine. The spine, shown below, provides the order in which the parts of the ePub file will be viewed as the reader scrolls through the ePub eBook.
If we open up the toc.ncx file in Notepad++, we can view the contents of the ePub’s built-in navigational table of contents as follows:
Clicking on the Text folder reveals the collection of XHTML files that are the contents of the ePub eBook. Each XHTML file is a single section of the eBook.
Opening up one of these XHTML files (New_Manuals.xhtml) shows the XHTML code. This is the same code that appears on web pages. An ePub file is just like a mini web site. One line of code contains a hyperlink and the last line links to an image, just like the HTML on a web page.
If we open up any of the XHTML files in a web browser, it will open up just like a web page. We will open the above file (New_Manuals.xhtml) in the web browser Firefox and we’ll see that it views just like a web page, as shown below. This demonstrates how similar an ePub file is to a web site. In fact, the best tool to create an ePub is an HTML editor used to build web sites such as Dreamweaver or Microsoft Expression Web (my favorite).
Clicking on the Styles folder shows a CSS style sheet (stylesheet.css). The Styles folder will always contain at least one CSS style sheet. There can be more than one. Opening stylesheet.css in Notepad++ shows the CSS styles in this style sheet which control all formatting and styling in the XHTML pages.
The Images folder contain all of the images (jpegs, gifs, or pngs) in the ePub document as shown below:
Now you see how it all fits together and how an ePub document is very similar to a mini web site.