Nuxeo 5.2 Milestone 4 Feature: Conversions and Previews


Fri 06 February 2009 By ismith

Exploiting OpenOffice

I decided to start off with the good
stuff for this post. I couldn't really call this article, "How you really
don't need Microsoft Office anymore because the open source tools are good
enough" because I was afraid of Bill Gates. Or, to be more specific, I was
afraid his hired goons; I would guess that he has the best goons
money can buy, and lots of 'em! Anyway, if you haven't been following the progress of
OpenOffice
, you should catch up on
things
; the product has come a long way since of the ... cough,
ahem.... "rough" builds of the early days. It runs really smoothly
now, and being open source it's designed to allow other programs to leverage
the great work that the Open Office (O-O) developers have done. (I am sure
there is a climate-controlled cave somewhere for the poor slaves that toil
away in anonymity on MS Office. I'm sorry, to those folks, for their working
conditions, but the good news is that their company is becoming
more like other companies
now.)

The "leveraging the work of
others" part is where this blog post connects with Nuxeo EP 5. Open Office
(starting around version 2), exported a service that programs can connect to
and use the imaging capabilities of O-O. The Nuxeo engineering folks did a
number of nice things that use this capability--and we could only do this
legally because we are open source too! Woo-hoo!

Seeing It In
Action

I have mentioned the "Preview" tab that's now available in
Nuxeo 5.2 milestone 4 in a previous blog post. One of the cool things that
happens when you have O-O running on your system is that the MS Office
formats "just work" in the Preview tab. O-O has to be running for this to
work, so you should start it yourself if you want to play along with this
post at home. So, to demonstrate, I've created a workspace with a few
documents in it about cloud
computing
:

???

These are
three files I found with google as demo material... I won't vouch for the
quality of their information! These are, from top to bottom, a PDF file, a
Powerpoint presentation, and a MS Word file. Lets see what happens if we
click on the word file and then switch to the preview window:

???

So, what has happened here is
that the MS Word document has been sent to O-O for conversion to HTML, then
Nuxeo has rendered that converted version into a Preview pane. If you are a
regular user of O-O, you are probably saying "Ho, hum, I've been doing that
for years." Well, maybe you should revisit my previous post on annotating
documents, huh? That little eye and the annotation service work for MS Word
documents as it does for images:

???

As you would expect, the same holds
true for the PowerPoint presentation--you can see it in the preview panel.
Somewhat cooler, is you actually get some sensible controls to actually read
the presentation as well. Here is a snapshot from the presentation:

???

Note the extra controls at the
top like "Continue" (perhaps should be "Next Slide") and "Last Page." This
is a pretty nice preview for not only not running MS Office, but not
paying for it at all!

PDF Magic

So, of course, I'm going to
show you a screen snap of a preview of a PDF document. Before I do that, I
should mention that the PDF imaging is actually not being provided by the
O-O system that I have been raving about but by another linux tool called
pdftohtml that is part of the (very impression) popper PDF imaging project. Ok,
here it is, sans any gratuituous annotations:

???

To return to my previous ranting
about how cool O-O is, O-O does have a story to tell about PDF, but in the
other direction. To generate a preview of PDF, as was done above, you need
to render PDF and then figure out how to best display that in HTML. O-O is
good at going to PDF from other formats, like those associated with
MS Office. Thus, when you have OpenOffice available, you get a slightly
different Summary tab as I am showing here:

???

The "Generate PDF" link will give you a PDF
version of the document by rendering it with O-O and then sending it to your
browser, without even needing a copy of MS Office! Sweet!

Internals
Note

If you thought it was a little weird that you had to start up
OpenOffice yourself to get these features, well, you are right. That was
just to make it a bit easier to explain. In fact [extra coolness points
here
] there is a new part of Nuxeo 5.2 milestone 4 that manages a copy
of Open Office for you, behind the scenes, if you configure your server to
turn this on. Since Open Office is based on Java and Nuxeo is based on
Java--and all three are open source--we actually use the OpenOffice code
directly (rather than through a unix pipe or something) which should give us
much better reliability as we move forward in the 5.2 GA release. There are
a number of options about how you would like this to work, such as how many
resources you are willing to dedicate to this slave version of O-O, in the
configuration directory in the file ooo-config.xml.

I hope this gives
you some hope that you don't have to keep paying the tax to run Office
applications in the content of your content management system! If you have
questions or comments about this article or anything else related to ECM or
Nuxeo, drop them to me at ismith [at] nuxeo [point] com.


Post Scriptum

Secretly, over a period of about a year, I
worked with folks at Fortune 500 company on a daily basis that were an MS
Office-only company. They never knew I was not running Office...it's not a
drop-in replacement or 100% compatible, but it is good enough, now.


Category: Product & Development