We can give you many reasons why we chose to integrate MongoDB into the Nuxeo Platform: scalability, high performance and availability, support for massive volumes and types of content and it would all be true. But the ultimate goal for Nuxeo is to provide developers with a flexible, agile platform that can help them build pretty much any kind of content-centric application they can think of. This is a goal that we share with MongoDB.
On the heels of our recent announcement, we had the opportunity to meet up with Matt Asay, VP of Marketing and Corporate Strategy with MongoDB, to talk about how the database industry is changing, how MongoDB is helping change it and the Big Data market in general.
Prior to joining MongoDB, Asay worked for a start-up doing real-time Hadoop. They were trying to commercialize an open source project called Storm, a real-time data processing solution. During his time with that start-up, he got a taste of how big the shift in marketing was becoming. He saw a massive shift in how companies were going to be engaging with their data. From his perspective (and many others), technologies like Hadoop, MongoDB and others are a huge part of that shifting landscape.
Asay told us that the decision to join MongoDB was because he saw an opportunity to be a part of one of the key technologies enabling companies to move away from a spreadsheet view of their data (which is essentially how you look at data when you use a relational database), to a document view.
This new perspective, he said, is much more expressive and flexible, allowing developers to code their applications in a way that’s more natural to them. This new approach means that developers don’t have to be a “slave to their schema, but rather let the schema flow from whatever app they are trying to do”. This gives them far greater ability to be creative and innovative with their data.
The Market for MongoDB
When asked about the competition for MongoDB, Asay’s response was very interesting. He said if you asked analysts and the media, the response would be other NoSQL databases, like Cassandra, Couchbase, and the hundreds of others available, but MongoDB doesn’t see that. He then said others would say it’s the relational databases, like IBM, Oracle, etc., and he said while there is some truth to that, it’s also not really their focus.
Asay said that MongoDB is enabling a new generation of applications that were impossible or extra difficult to write with a traditional database. He said MongoDB is trying to enable this new world of developer productivity.
He gave the example of Uber to help explain. In a recent article, the CEO of Uber said that people see them as competition for taxis, but they see themselves competing with all the pent up demand for a ride some place that isn’t being captured by a limo/taxi service. Asay said that MongoDB is doing something similar. It is competing with pent up demand to build apps that were impossible to build before they came in.
Need some examples? Asay offered three:
- MetLife: MetLife had over 70 different data systems. Every time a customer called they would be put on hold while the customer service rep scrambled between all these systems to find out who the customer was and what they had bought from them. MetLife wanted to put all this data into one place and have the CS reps access a single location. They tried for years to enable this single view with a relational database and couldn’t solve the problem. Then they tried MongoDB. With it, MetLife had a pilot running in two weeks and were in full production within three months. The application they built handles 45 million agreements with 140 million transactions.
- A large software company wanted to move their business completely online (historically they sold licensed, packaged software). They were finding it difficult to move their data and have their application transition from desktop software to the cloud using a traditional database. With MongoDB, they made it happen much easier.
- Mailbox on your phone. Mailbox is a new way to do mobile email. They needed to be able to scale from zero users to a million users in six weeks. Now, they have millions of users. MongoDB allows them to scale horizontally, adding servers and capabilities with no downtime. This was something they couldn’t do with other technologies.
Asay said that the apps in the NoSQL market could realistically replace 80% of the databases used today. But NoSQL will not completely replace the relational database; there are scenarios where it fits (like ERP, CRM). But, as Asay pointed out, MongoDB’s goal is not to replace existing databases; rather it’s to expand the market by 100-200%, taking it from a $30 billion operational database market to a $60-100 billion market.
Working with Big Data - It’s About Variety
The big question that many want to know is, how do we handle Big Data? Is the market evolving? Asay said the answer is yes and no. He talked about a survey Gartner recently did with IT directors. Gartner asked how many had a Big Data project - the overwhelming majority said yes, with projects in various stages of deployment. A later question in the survey asked what these directors saw as the biggest inhibitors to success. The number one answer was “we don’t know what we are doing” and another big response - they couldn’t find anyone who could help.
All the answers, Asay said, went to the need for help. They know it’s a big deal, but they don’t know how to get the most from it.
Asay noted that many think that Big Data means big volume. But the reality is that for most companies volume is not the hard part to solve. In fact, most companies don’t have huge volumes of data (petabytes), at least not for individual projects.
The big problem is taming the variety of data. Variety is one of Gartner’s three V’s to big data: Volume, Variety, and Velocity. Variety, he said, is the big challenge and it’s the area MongoDB and other similar NoSQL databases handle with ease - it’s what they were born to do.
The trick, Asay said, is to not get bogged down by starting big, but with small pilot projects. One example he gave to demonstrate this approach is the Windy Grid project in Chicago. That project started on the laptop of the then chief data officer at Goldstein - he downloaded MongoDB and just started building this app. Now, it’s running on dozens of servers.
The best way is start small and iterate. Asay said the big challenge with Big Data, and the secret, is asking the right questions to it. You almost always ask the wrong questions to start, so it’s a process of iterating towards the right questions, and which data should you be querying. You only get to where you need to go by assuming the questions you start with will be wrong and moving forward from there.
Essential Software is Open Source
For essential software, whether it’s a database or operating system, Asay says it’s a losing proposition to go into an enterprise with commercial software. A few companies, like Splunk, have managed it, but for the most part open source is the only way to develop and distribute infrastructure software (Asay commented that this isn’t necessarily true at the application software level).
When asked why he thought this was the case, Asay said it’s because the core buyer is a developer. It’s someone who wants to download it and try it out, not get into discussions with legal or purchasing just to try the software. Developers in the market have been conditioned to expect that the best software, the stuff they will depend on, is open source.
There are few counterexamples to this theory. The Big Data landscape is all open source, from Hadoop, to MongoDB and other NoSQL databases, to Storm for real-time data processing.
Asay believes that part of this is because a lot of the Big Data technology is emerging from companies that aren’t software vendors. Google, Facebook, LinkedIn - they are giving the software away, there’s no financial interest, they are simply trying to attract developers.
This, Asay said, is where Big Data is born.