In January, we announced the availability of our most recent, long-term-supported product release, which is affectionately known as LTS 2019. This release is significant for two reasons. First, it ushers in a new era for Nuxeo where, in 2019, we will move to a continuous innovation model and will deploy new features and capabilities to our Cloud Content Management system offering as they become available (more on this later this month). And, second, LTS 2019 features a greatly enhanced set of capabilities for artificial intelligence (AI) and machine learning (ML).
Now, there is a lot of hype and noise in the market about AI/ML and, in this case, I believe that it’s absolutely warranted. Without a doubt, we are in the midst of a revolution in information management. For years, we have drawn an unfortunate and artificial line between content and data, unstructured and structured information and – in many cases – have relegated content to a secondary status, more of a complex problem to be solved than an opportunity to be exploited.
Think about it. By most estimates, content – or unstructured information – represents more than 80% of all information. More importantly, almost all human-generated information is content. Content literally is how we work. It’s how we communicate and collaborate with one another. It informs our decisions. It allows us to analyze large amounts of data. And, with the explosion of rich media, it’s also how we engage with and delight our customers. Simply put, content is absolutely critical to the modern enterprise – and yet most organizations are primarily concerned with how to create it, how to store it, how to deliver it or even how to “manage” it.
Very few organizations look at content as a true information asset.
The reason why is simple. Content is complex and it is difficult to manage. And, typically, it takes a human to interpret content and its relative value. But, with the advent of AI/ML, we are beginning to tear down the walls between content and data. We now have automated tools that allow us to better understand unstructured information, to extract valuable data, to provide insight into its substance and import, and to even interpret the sentiment of its author. In short, we now have unprecedented ability to enrich content, making it easier to quickly find, to readily reuse, to accurately deliver and even to intelligently mine to provide valuable new sources of competitive advantage and business transformation for today’s enterprises.
Artificial Intelligence: Where are We Today?
Now that we’ve talked a little about the promise of AI/ML technologies, let’s talk about where Nuxeo is today and also what’s included in LTS 2019.
Last year, we announced an AI framework that enabled broad integration with, and support for, various third-party AI engines like Google Vision, Amazon Rekognition and even Amazon Comprehend. Like some other content and digital asset management (DAM) vendors, we made use of these tools to provide general tagging for images, auto-classification for content and sentiment analysis for documents and communications. We even provided support for tagging video content.
But what we realized is that – while these tools made for really great demos – they really didn’t deliver a lot of business value to our customers. Yes, it was impressive to see a machine apply tags to an image or a video. But, what could these tools actually tell you and how valuable is a tagging model really? The issue with most generic AI engines (like Google Vision) is that they’re generic. So, these tools can tell you what is in an image and they do a fantastic job of enriching content, but I would argue how much of the information is really valuable to the business?
Additionally, we also recognized that tags or labels are great for helping to search for and retrieve content or assets, but they’re not really good for much else. In most content management systems (including Nuxeo), tags are typically captured as a string of text objects in a single field. As a result, you can’t really use tags to launch content management workflows or kick off specific business activities. And, it’s hard for users to interact with tags in any meaningful way – for example, to confirm the accuracy of data being applied to an image.
So, where are we now? First, in LTS 2019, we’re pleased to announce that Nuxeo customers can now train and deploy their own custom AI models. We call this business-specific AI and the key difference is that our customers can now use their own data to train AI models that are tailored to the unique needs of their business. And, second, we’re also pleased to announce that Nuxeo now supports full entity extraction from AI, allowing us to map data that is generated by AI engine to specific metadata fields.
As a result, not only can we train the AI engine to provide more accurate data about the document or asset, we can also extract this information and apply it as metadata.
Why is this important? Well, while tags enable us to help find and retrieve images, metadata extraction allows us to do much more, like truly automate image capture, launch workflows and dependent business processes, and even associate new content or assets with pending tasks or work assignments.
What Does This Really Mean?
Here is a simple scenario to better explain how all of this comes together. Following is an image of a Ford F-150 pickup truck, which has been the best-selling vehicle in the United States since 1986.
After a quick “drag and drop” test that you can do yourself, Google Vision applies a number of fairly generic tags to the image, correctly identifying it as a Motor Vehicle and even a Pickup Truck. It also applies some more brand-specific tags, identifying the image as a Ford and even a Ford F-Series.
Now, don’t get me wrong, there’s some specific business value here. Google Vision has correctly identified the brand and even the model series. It has also attempted to identify the specific model, applying a “Ford F-350” and “Ford Super Duty” label to the image. However, this is a Ford F-150 truck. So, principally, Google Vision does a good job at enriching the data associated with this image, but the data itself is still fairly generic and, in some cases, it’s even inaccurate.
Also, notice that Google Vision has an inherent bias toward what is at the center and forefront of the image, hence the Wheel, Tire, Fender and Bumper tags, but no tags for items in the background of the image.
Now, let’s look at the same image from a more business-specific perspective:
If I worked for Ford’s marketing team, what I would really want to know is that this is an image of a Ford F-150. That it’s a Limited edition. It’s a four-door, SuperCrew model in Agate Black. And it features 22” chrome rims. And, I might also want to know that image contains pictures of boats in the background and a sunset view. This is an example of business-specific data. Simply put, for data to have real value for the business, it needs to be specific to the needs of the business. This is why fully trained, custom AI models are critical to a Content Services Platform (CSP), like Nuxeo.
Also, notice also that I have represented the data in this picture differently. Instead of a jumbled-together string of text values, or tags, here we are depicting true metadata extraction. Values for the brand, model, trim, color, etc. have been appropriately extracted and applied to specific data fields, which not only enables a more accurate, parameterized search, but also allows us to map new assets or content to specific workflows and pending work activities. And, if we want a human to validate these metadata fields, it is much easier to present this information in an intuitive and easy-to-work-with fashion.
It’s Smart, but is it Savvy?
There’s no doubt about it. Artificial intelligence is changing the world around us and it’s changing how we work with content and digital assets. But, there’s a difference between smart and savvy, and to add real value, your AI engine has to be business savvy.
This’s why LTS 2019 is so critical to the Nuxeo Platform. With our 2019 release, we’ve given our customers the ability to employ customized machine-learning models and to train them using their own specific data sets. We also support true entity extraction to map data values to specific metadata fields, moving beyond simple tagging structures.
In my next blog post, I’ll talk about our 2019 roadmap for artificial intelligence and how we’re taking these capabilities even further. I’ll also share our four tenets for artificial intelligence and our long-term vision for how AI/ML can add even greater value for our customers.