Artificial Intelligence (AI) is one of those buzzwords that seems to have been around forever and a day. But just recently, there seems to have been a real transition from AI being a “technology to watch” to one that is actually starting to deliver on its promise.

This increase in “real-world” AI use cases could be a tipping point for the adoption of the technology within all types of industry. However, many information management-related use cases still focus on either simple classification of content as part of the capture or ingestion process, or as a more advanced, learning-based version of optical character recognition (OCR).

These provide fantastic value - but miss a huge opportunity in my opinion, which is to use AI on the mass of content and data that already exists within the organization (in the “digital landfill”).

Most of us have heard about Big Data and Big Content, and we all understand the challenges of locating important information residing within various systems and repositories (the equivalent of searching for needles in a haystack), but what if we could use AI to explore our own digital landfills? To search for nuggets of useful information, recycle useful information, and even get rid of content that really doesn’t need to be kept.

Well, I have some good news - we can do all of these things - and they are set to be three of the hottest trends around AI in Information Management in 2019.

Enriching Metadata

There is a saying that “Information is everything, and everything is information.” Well from an information management viewpoint, arguably the most important type of information is metadata - or information about information.

Think back to the days of document management, or even old-school enterprise content management (ECM), when the content, or more specifically the document, was king. Each document that was stored became the focal point for processes such as invoice processing, claims management, and so on - and each of those documents had a set of metadata attributes or tags to use the modern name, associated with it. Typically this was limited to include things such as filename, date created, author, type of content, etc - and for most systems, once the metadata schema (or the set of metadata stored) was defined, it was pretty much fixed. Changing metadata schemas required a lot of development work along with mass updates to all content related to that metadata. Not fun.

Well, things have changed. Metadata schemas in a modern content services platform are flexible and extensible - if you want to add a new metadata field, go ahead and do it. In addition, much more metadata is being stored and used than ever before - image resolutions, language of a document, geophysical data, and more.

artificial intelligence information

This increased capability and the ability to utilize metadata much more effectively is a distinct benefit of a modern Content Services Platform (CSP) like Nuxeo over an old-school Document Management or ECM solution - but what about the content stored in those legacy solutions?

Another unique aspect of a CSP is that it can connect to content from legacy systems, leaving the content itself in-place (in its legacy repository), but providing access to that content from the CSP. It also provides the ability for legacy content to make use of a modern metadata schema from the CSP - effectively allowing you to add metadata properties and data to the legacy content, without making any changes to the legacy system at all. This is massively powerful - especially when combined with AI so that this process is automated.

Imagine this scenario: you have a legacy ECM repository containing customer documents. Despite the best intentions of your staff, those contracts are not as well managed as they could be, and the only relevant metadata attributes associated with these documents are customer reference numbers. By using a CSP to pass that content through an AI enrichment engine, you can potentially append additional metadata attributes to each and every one of the files currently stored, which immediately injects more context, intelligence, and insight into your information management ecosystem.

The AI engine could identify:

  • The type of each document - contract, correspondence, invoice, etc
  • Documents containing personal Information, which then may automatically initiate additional security controls and provisions per privacy policies or regulations.
  • Documents that should be deleted per retention policies.
  • And much more!

Identifying Important Content

A key part of enriching metadata is that ability to ascertain “what is what.”

There are many uses for this - from simply being able to identify a document as a presentation, brochure, contract, invoice, etc. This capability is a core facet of knowledge management, namely the ability to surface and share information and content that is relevant to other situations. That can take the form of providing existing solutions to technical support questions on a helpdesk, to providing all contracts that relate to a particular customer, and anything in between. If you don’t have good metadata on the content, then this is simply not possible.

But beyond that, every industry has compliance regulations that require each different type of documents and records to be kept for a specific period of time - these are known as retention policies or rules. If you can’t determine the type of the content, how on earth can you apply retention policies to it? There were typically two ways to do this in the past - manually, or not at all. The manual approach was incredibly tedious, error-prone, and very time-consuming - which led a lot of organizations to take the “keep everything” just in case approach.

But by using an AI-driven engine to classify content stored within legacy systems, this becomes much easier to do.

AI information management

Even simple AI tools can identify the difference between a contract and a resume, but advanced engines expand this principle to build AI models based on content specific to an organization. So, for example, if your business needs to know the difference between a personal life insurance document and a life annuity document, then this can be incorporated into a specifically-trained AI model, which in turn will deliver a much more detailed classification than could ever be possible with a generic classification.

And using a CSP to apply this to the mass of content stored in those legacy systems can add significant benefit to your business, and increase the visibility you have into both your key information assets and liabilities.

Ditch the Trash

The “keep it all just in case” approach described above not only exacerbated the digital landfill effect but also meant that a lot of information that could (and often should) have been destroyed, was not. Aside from the cost of having to store this content ad-infinitum, there are significant legal issues that arise from keeping information longer than you need to.

There is a whole industry dedicated to managing records, and we’re not going to get into the technicalities of that here. But AI can be used to help mitigate this problem significantly.

Part of the challenge of managing records, or even simply applying retention policies, is the sheer volume of content that needs to be managed. And the only way to go through this in the past was document by document.

ROT content

A key point here is that, due to the legal ramifications of incorrectly declaring (or not) a record, there is a desire to still include a human interaction (or checkpoint) as part of this process in most organizations.

AI can help with this. By using AI-classification of content with a CSP, it is possible, at a massive scale, to quickly and easily determine what is NOT a record. According to numerous research studies the significant majority of content stored is ROT (redundant, trivial or obsolete) - so by clearing out huge chunks of that ROT, the task of identifying relevant content to apply retention policies to become much, much easier. And yes AI can then be used on the remaining content to identify the type of the content in more detail, match that to the retention rules, and then make recommendations to the relevant staff members. This makes the whole process of identifying, declaring and managing records (for which I really mean anything that needs to be retained against a retention rule) incredibly straightforward, much more scalable than before, and much more cost effective given that the storage requirements for old content just got slashed.

Whoever thought that sorting the trash out could be such a rewarding exercise.