This month, meet Florent Guillaume, Head of Nuxeo’s R&D department. Along the information super highway, meet one of the minds who helps you navigate through. Take a moment to cruise through the different layers of the Nuxeo Platform and learn how enhancements to the VCS have been made to help keep you on track, maximize efficiency, and enable rapid development of applications.
Tell us about your role at Nuxeo and how long you’ve been with the company. I’m the head of R&D at Nuxeo and I’ve been with them, I guess from nearly the beginning. I started about 10 years ago about mid-2001 and was maybe employee number 2 or 3?? I can’t remember for sure, though, it was a long time ago… Prior to that, I worked with the French Ministry of Education in a general technology role, providing IT services for the entire organization.
You have been one, if not the main, architect for the Nuxeo Platform when it comes to developing the Virtual Content Store technology in the Nuxeo content repository. Can you tell us in plain terms what makes the VCS technology and the Nuxeo Content Repository unique? Well, this level of the software stack has to do with storing the actual bytes (data) somewhere, holding them in a database and organizing them so that access is efficient, so that indexing of this information can happen, which also relates to how well scaling and clustering works as opposed to other things such as user interface, presentation, which isn’t encompassed by this. This might not be as important to users, but to developers or sysadmins it makes a huge difference—they can certainly tell if the software is architected well or not.
The idea behind VCS is that as we take data from the user model and put it in a database for storage, we don’t want to hide the structure of the information. We want (and aim) to keep the data stored in its natural form. This is very important because anyone who wants to develop the application or backup the database can understand the data in a visual capacity because we retain its original presentation. It’s a bit of a WYSIWYG for data, which is quite different from the system we used before (i.e. Jackrabbit) and when we decided to move to this new model it took close to six months to make the changes to the code and then almost a year to fine-tune.
But, it was worth the effort as it has positively impacted the relationship between Nuxeo Studio, our customization environment, and the back-end repository, allowing you to create new doc types and schemas, which then tells the database to create new tables or reorganize data. Studio allows you to decide how the data in your repository is organized within the context of this storage framework.
How have customers reacted to these changes? We’ve received quite a bit of feedback and all of it has been positive. The transition (for those who did the transition) was sometimes tedious, that’s for sure, but worth the effort.
Can you tell us a little more about what’s unique about the Nuxeo repository? It’s a cluster enabled design, so from the start users are able to work in cluster modes. In this model, you don’t have just one machine accessing the database which is very good for scaling and allows for high availability.
Another aspect is that we deal with binaries separately from the metadata that’s in VCS (database) itself. As I was saying before, the way we store binaries is very flexible. Binaries are a large percentage of the site, especially as you get into storing PDF, video, and images—once you get to this level of complexity, you’re not going to want to store them in a SQL database, at this point you’ll want to store it in a pluggable back end. For example, something like Amazon S3 which enables this even if you don’t have the need for something more robust…with the added bonus of being in the cloud.
So give us an example of how you see the Nuxeo software architecture in your own head. I see storage at the bottom with an interface that functions as a bridge between the application and the existing world (disks files, file system, database). The VCS layer at the bottom has to know how to write bytes to a disk because retrieval encompasses two critical aspects when it comes to data: knowing what you want to retrieve and then being able to fetch that data. Generally, there’s a search aspect incorporated into this such as knowing general aspects about your content, specific document IDs, etc.
Then, on top of the Nuxeo core (core API) are a number of services that extend the platform like the document management core, convergent services as well as query services and event services. The very last level is the actual UI and high level services that interface with external systems.
You are involved in quite a few Open Source projects and especially some about standards such as CMIS, we have the impression now, almost a year after being 1.0, that CMIS finally delivers its full promise by facilitating and speeding up technology integrations in the content technology space, would you agree with that? How do you see CMIS progress, still early stage / mid path, almost done…? A number of people had great expectations, hoping to foster adaption and interoperability between many vendors—and that’s been happening—but not as quick as people expected. Big vendors often means big politics. They use CMIS for some features, but the goal was not to be the end-all be-all of content management, it was supposed to be a common layer between vendors. And this common layer is working quite well. And, what’s going on now is that the working group keeps adding new features so we’ll see 1.1 will come out around end of year and then 2.0, with new features a year later.
How involved are you in these projects? I participate in phone calls every two weeks, conference calls; I’m active on the email list, and the Jira ticketing system.
I was previously involved with JCR2 (java standard) but the spec is completed now so it’s freed up my time a bit to focus on other areas. There’s also a new standard that represents a new OASIS effort for web experience, WEMI, created a couple weeks ago—I’ll be a part of this as well and it will focus on web experience management interoperability.
If you want to differentiate between the two, I guess I can put it like this: CMIS is more about document management and content management and how it should be implemented, whereas WEMI focuses on how to get content to the web and aggregate information, but it’s still early on in the project…
What has changed between when you first started at Nuxeo and now, in terms of core technology for content management? I think the biggest thing that has changed is 1) ten years ago there was no name for the cloud and it wasn’t such a big deal to store your data out of your reach, but now it is
2) another thing that changed, when you look specifically at the technology layer that existed 10 years ago, you rarely encountered a case where NoSQL hadn’t been invented. People see it now as a legitimate way of storing data, but simplification of scaling has its price. Non-relational - simplifying of the data model itself basically allows users to have key value stores instead of complex relational stores.
One thing that can be noted about CMS itself was that it wasn’t seen as a demand in itself that needed APIs or thought of independently in the storage layers. Since then it’s been defined as something worth thinking about at itself. 10 years ago you didn’t think much of CMS, you just thought: “I have data and I want to store it.”
What’s your prediction for 2012? There will be a greater push to enable deployment of Nuxeo solutions in the cloud…nothing ground breaking. It’s more a push for mobile access to your data on the road than anything else.
And, your thoughts on Open Source Software? My experience has been that it’s been proven that open source development works. It’s not a commercial aspect, it’s a development aspect…something that users don’t have to think about and buyers very little (except that they won’t be tied to one vendor). Plus the benefit of Open Source is that it’s an established methodology, so market penetration will really be what determines the growth rate.
What’s your take on the recent Nuxeo Platform launch? Any personal thoughts about this new release? I think it’s much simpler to understand from the user perspective and by this I’m talking about technology consumers - devs and sysadmins. Now they have only one stack with modules, so it’s simpler to have different platforms with the same technological base. Before, it was more difficult to develop than what we have today—the platform is becoming more flexible with each release.
Another big advantage is that being on the same platform allows users to have different interfaces and diff ways to manage their data…they can view data under different lights so-to-speak and still be inside the same application. For example, they can look at the DAM module, see the data as assets, really designed for the asset mgt world but at the same time they can look at the same data from a document management point of view. For some users, it’s very important to have different ways of seeing their data.
Interaction with your customers is a large part of designing the content management system architecture behind the product. You have to deal with real world use cases and rely sometimes on anecdotes that tell you what’s important and what’s isn’t. You have to be able to see that solving a particular problem for one customer could possibly help many other users do things better.
Any particular ideas that you want to incorporate in the platform for future releases? [laughs] Uh, you want to see my to do list…..???
Well, one thing I’d like to see is that, in the future, we unify the way we store file metadata. Right now, we have different ways of seeing various types of files and not all are store in the same way. This will mostly improve the speed of some operations such as specific search.
On my wish list, let’s see…well, centralizing and making visible the background tasks that Nuxeo is working on. Many actions that you initiate trigger background work that you don’t see, things like full-text indexing or image metadata extraction, content transformation, video indexing, conversion between doc types, or synchronization between Nuxeo and external platforms. Often when you click on a task within the UI, these happen in the background—we need a centralized admin screen where we can see all these tasks happening, prioritize and/or cancel them. Basically, allowing sysadmins to see what’s behind the curtain.
What do you like about working at Nuxeo? That it’s friendly and full of dedicated, intelligent people. On the dev side, you can trust people to know what they’re doing and be good at it.
What do you do in your spare time? I tend to keep myself busy with a bit of sports…rockclimbing, skiing, and this year I want to start diving. I try to spend time with friends and not always immerse myself in technology even though I fail half the time [smiles].