Nuxeo & MongoDB: An Insight into our Benchmarks, MongoDB Days, and MongoDB 3.2
A MongoDB developer conference recently took place in Paris at a nice venue on the Champs Elysees. It was a great opportunity to not only meet many MongoDB fans but also people interested in discovering what NoSQL and MongoDB can provide. The 9 hour conference provided a lot of great content to a packed house where people were attentive through the entire day!
Here are a few takeaways from the MongoDB Day in Paris.
One of the talks was focused on the WiredTiger storage engine that is still optional in MongoDB 3.0 but is scheduled to become the default in version 3.2. The key advantage of WiredTiger over the MMAPV1 storage appears to be a better concurrent management strategy (document level) allowing for better performance of write operations.
Recently, we set some benchmarks using the Nuxeo Platform and MongoDB.
With these recent benchmarks we also performed some tests to compare the 2 storage engines MMAPv1 and WiredTiger.
The results are:
- Read performance is about the same.
- For write operations, WiredTiger can be up to 2.5 times faster than MMAPV1 (and 5 times faster than SQL).
Most of the MongoDB configurations and administration go through the command line shell. As a developer, I find it very handy and I expect a DevOps would also like it. However, when it comes to managing a large Cluster, this can quickly become a real pain.
It looks like Ops Manager is a great solution to have a web administration UI and avoid having to run hundreds of command lines against 10 different shells in the right order. This presentation was good at explaining how Ops Manager can be helpful when you want to run a rolling upgrade of the nodes inside a cluster.
In Nuxeo, we have a similar concern: a lot can be done using
nuxeoctl and the Nuxeo Shell, but when it comes to managing a cluster, we are missing a central administration web UI to coordinate the different nodes.
The design principles of Ops Manager are simple but nice, so we may consider using a similar architecture for providing a Nuxeo cluster manager.
Nuxeo & MongoDB
We did a small presentation about why we decided to integrate MongoDB inside the Nuxeo Platform, how we achieved the integration and what benefits we get from this integration. The slides are available online. We will give a shorter version of this presentation in other MongoDB days:
- Mike Obrebski, Solutions Architect at Nuxeo, will be in Munich on October 20th
- Alain Escaffre, Director of Product Management at Nuxeo, will be in London on November 5th
- I will be in San Jose on December 2nd
About MongoDB 3.2
It looks like a lot of new features will be introduced with the MongoDB 3.2 release. Here’s what I think are the most interesting points:
I know that a lot of people enjoy the schemaless nature of MongoDB and even if I can understand the joy of freedom, my concern would be that Ops people are not that happy with it.
The fact that the enterprise toolbox provides tools, such as MongoDB Compass, to basically reverse engineer the schemas is a proof that having real schema definitions at the application level do make sense, at least from an operations perspective. Once you have "figured out the schema", it also really makes sense to be able to define some constraints at the storage level and it looks like this is what "Document Validation" will be.
For people using MongoDB via the Nuxeo Platform it won't have much impact since the repository layer on top of MongoDB already handles the validation logic that can cleanly bubble up to the UI. However, the Nuxeo DBS internal point of view may be something we want in order to better manage schemas in the schema-less db.
In-Memory Storage Engine
Having a real in-memory storage engine is very likely to provide high-end performance. However, if there is no data durability it does limit the use cases. Hopefully, the beauty of the Storage Engine pluggability is that it is possible to have different storage engines inside the same replica set. This means that the master can use "in-memory" storage to deliver blazingly fast performance while the other members of the replica set can use WiredTiger to have an "eventual persistency".
In a sense, this is close to what we do with Redis: we use in-memory Redis for storing the caches and processing queues, but having Redis persistence to "backup what we have in memory".
BI Connectors & Lookup
As already visible in the development documentation, MongoDB has integrated a new $lookup. This is a kind of simple join between collections and it’s likely to be the cornerstone for the BI Connector that should translate SQL inside MongoDB query and aggregation framework.
On the Nuxeo side, since all data for one repository is accessible into a single Collection, we don't really need this join. But:
- Having SQL access to Nuxeo data stored in MongoDB does really make sense
- Having simple joins between collections would allow the Nuxeo Platform to store Audit data into MongoDB while still maintaining the possibility to run analytics based on repository and audit data.
So, it looks like we will have to quickly start testing MongoDB 3.2 with the Nuxeo Platform!
Category: Industry Insight, Nuxeo Updates