[Monday Dev Heaven] How to add an HTML preview for iWork Pages files

Mon 09 April 2012 By Laurent Doguin

Last week I got slightly frustrated when I received a .pages file that I could not open with any software on my Ubuntu system. And I could not see a preview on Nuxeo because this format is currently not supported. For those who have never heard about ".pages" files, they are documents made using the Apple iWork Pages application, which has its own format. With the growing population of MacOS users, this will surely happen again. But fear not, because today I will show you how to create your own file converters on a Nuxeo application so that you can, for instance, add an HTML preview for the iWork files without having to install anything on your laptop.

What's your file's mime-type?

A first and mandatory step to deal with files in Nuxeo is to be sure you'll get the right mime-type. I've first tried to upload a Pages file into our intranet but it was detected as a zip file. A quick look at the mime-type using the so called mime-type command confirmed I was dealing with a zip file:

[~]$ mimetype hello.pages
hello.pages: application/zip

So I have to add a small contribution to the MimetypeRegistryService. Details are as usual on explorer.nuxeo.org .

<fileExtension name="pages" mimetype="application/vnd.apple.pages" ambiguous="false" />

This code will make sure that each time a Pages file is uploaded in Nuxeo, its mime-type will be set to 'application/vnd.apple.pages'. Now that I know what kind of file I'm dealing with, I can start using converters.

An introduction to converters

A Pages file is basically a zip file containing all of its different assets, such as images and some XML files for the content. But the good thing is that sometimes, the zip also contains a preview folder, itself containing a pdf preview of the document. This is just perfect for our needs. I will create a "pages2PDF" converter.

A converter can be added as usual through an extension point.

<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl"

&lt;converter name=&quot;pages2pdf&quot; class=&quot;org.nuxeo.ecm.platform.convert.plugins.Pages2PDFConverter&quot;&gt;


It basically comes down to three things. We need a sourceMimeType, a destinationMimeType and a Java Class to handle the conversion. We can have as many sourceMimeTypes as we want. A classic example would be the any2pdf converter that can take XML, HTML, plain text, RTF or any other LibreOffice supported format. We will choose the Pages mimeType 'application/vnd.apple.pages'. Then we want only one destinationMimeType: 'application/pdf'. The goal of our Java Class will be to take a Pages file as input and return a pdf file. This class must implement the Converter interface. That makes two methods to implement: init and convert.

As stated in the documentation, init can be used to retrieve some configuration information from the XMap descriptor. I can already tell you that we don't need any options so my init implementation will be empty.

The convert method gives us a BlobHolder and a parameter Map and must return another BlobHolder. So all we have to do is look for the pdf preview in the Pages file, extract it and return it as a BlobHolder. Here's what I did:

public class Pages2PDFConverter implements Converter {

private static final String PAGES_PREVIEW_FILE = &quot;QuickLook/Preview.pdf&quot;;

public BlobHolder convert(BlobHolder blobHolder,
        Map&lt;String, Serializable&gt; parameters) throws ConversionException {
    try {
        // retrieve the blob and verify its mimeType
        Blob blob = blobHolder.getBlob();
        String mimeType = blob.getMimeType();
        if (mimeType == null || !mimeType.equals(&quot;application/vnd.apple.pages&quot;)) {
            throw new ConversionException(&quot;not a pages file&quot;);
        // look for the pdf file
        if (ZipUtils.hasEntry(blob.getStream(), PAGES_PREVIEW_FILE)) {
            // pdf file exist, let's extract it and return it as a
            // BlobHolder.
            InputStream previewPDFFile = ZipUtils.getEntryContentAsStream(
                    blob.getStream(), PAGES_PREVIEW_FILE);
            Blob previewBlob = new FileBlob(previewPDFFile);
            return new SimpleCachableBlobHolder(previewBlob);
        } else {
            // Pdf file does not exist, conversion cannot be done.
            throw new ConversionException(
                    &quot;Pages file does not contain a pdf preview.&quot;);
    } catch (Exception e) {
        throw new ConversionException(
                &quot;Could not find the pdf preview in the pages file&quot;, e);

public void init(ConverterDescriptor descriptor) {


Now we can convert a Pages file to a pdf. But to be able to preview it, it must be in HTML. So the next logical step is to make a 'pages2html' converter.

PagesToHMTL converter for the preview service

A very cool feature about the converters is that they can be chained. If I want a 'pages2html' converter, I can have it by first converting the Pages file to a pdf, and then converting the pdf to HTML. This contribution would look like this:

<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl"

&lt;converter name=&quot;pages2html&quot;&gt;


We now have our 'pages2html' converter, which means Pages files will be previewable in Nuxeo. That's it for today. If you have any questions, use the comments section or go to answers.nuxeo.com. See you on Friday!

Category: Product & Development
Tagged: Java, Monday Dev Heaven