About StandardsStandards matter, but don’t be blind

Last but not least in this series of posts, while all software platforms are somehow made on standards, for Standards, most of them are showcasing this as a strong selling point, we will try to explore why standards really matters and try to give some hints on how to look at standards’ support in a content management platform in a pragmatic and meaningful way!

Why Standards

Non-technical users really don’t care about which standards exist and which are emerging. They care that platforms play well together; they want interoperability. They want solutions that can communicate with each other without excessive effort and cost. They want to be able to move content between platforms if a new tool is selected. It could be said that standards in the ECM space are more about avoiding content lock-in than vendor lock-in.

Standards provide a set of guidelines and mechanisms for interacting with a technology. Adoption of standards has a number of benefits, with the most frequently cited being interoperability. No organization wants to be tied to a single vendor or product option for implementing a technology solution – no matter how well the vendor’s solution functions or the vendor provides service. Standards adoption has a number of additional benefits such as:

  • Lower the technology adoption costs
  • Increased development consistency, simplicity and predictability
  • Improved code reuse
  • Reduced cost, time and effort to transition between vendors and solutions
  • Reduced focus on commodity and infrastructure
  • Ability to create composite interfaces that are tailored to the needs of specific job roles – mashability
  • Improved application portability
  • Enable faster time to market because it is easier to purchase off the shelf components and applications that can integrate and provide features for the solution

Organizations should understand which standards provide the benefits that are most important for their needs when adopting an Enterprise Content Management system.

Existing and Emerging Standards

They say the good thing about standards is that there are so many to choose. This may be humorous, but seasoned technologists know that, unfortunately, the quip has some truth – the world of enterprise content management is no different. There is no single standard that is more important than all others. There is no universal definition of what is most valuable; it always varies by the unique technical and business needs of the organization.

Not every ECM vendor and product will support every standard. However, it is important to determine the standards that are most important for future business and technical strategy and ensure they are supported by the potential ECM platform. For example, an organization concerned with the publishing industry might have a strong interest in adopting the NewsML standard, whereas an organization with more generic and horizontal coverage might have more interest in supporting Content Management Interoperability Standard.

Standards impact a number of areas in the ECM market and it is important that these be understood.

Interoperability

As noted above, interoperability is one of the primary drivers for standards adoption. Interoperability takes many forms. In ECM, interoperability is primarily targeted at providing a standardized way for content-based applications to share their content assets.

The main standards related to interoperability for ECM solutions include Content Management Interoperability Services (CMIS) and Java Content Repository (JCR). CMIS has grown slightly more popular than JCR due to the technology agnostic approach taken by the standard.

CMIS is one of the most recent standards in the content management space; it was specifically designed to support interoperability among ECM solutions. Officially adopted in May 2010, managed by OASIS and supported by a large number of vendors, the standard defines a vendor agnostic domain model, abstraction and set of bindings that allows the sharing and accessing content across multiple ECM tools. Key services provided by CMIS include:

  • Repository Services: Enable information discovery of the content repository and the object types defined for the repository
  • Navigation Services: Supports navigating the folder hierarchy in a CMIS repository
  • Object Services: Enables management of repository objects (Create, Retrieve, Update, Delete)
  • Discovery Services: To search for objects in a repository
  • Versioning Services: To manage the lifecycle of repository items.

A number of vendors have dedicated themselves to moving the CMIS standard forward. An example of this is the Apache Foundation reference CMIS implementation, developed within the Chemistry project, with support of developers from various content management vendors, such as Day Software and Nuxeo. Another notable effort is the contribution by Nuxeo of its content repository to Eclipse, demonstrating their willingness to work on a reference implementation of CMIS in a content repository. The project, originally called “Eclipse Enterprise Content Repository”, has been officially approved by Eclipse and was rebranded “Apricot.” It relies on the Chemistry project.

Other official or de facto interoperability standards that architects may want to explore because they could impact the overall ECM solution interoperability include:

  • Windows SharePoint Services (WSS): Not actually a standard, but a set of services for accessing content in Microsoft’s SharePoint products
  • Web-based Distributed Authoring and Versioning (WebDAV): a HTTP-based standard that facilitates collaboration between users in editing and managing documents and files stored on web servers
  • Java Content Repository (JCR): a low-level Java specification, although adapters have been created for other languages, defined under the Java Community Process
  • Common Internet File System (CIFS): a protocol that allows applications make requests for files and services on remote computers via the Internet.

Metadata

Metadata augments content stored by Enterprise Content Management softwares with additional details such as taxonomy, relationships, security attributes, usage characteristics, auditing information, and any number of additional attributes. How important is metadata to an ECM solution? It is critical. Without metadata, it becomes almost impossible to manage, maintain control and find content in an ECM tool. There are a number of standards that impact metadata creation and management within ECM solutions such as XML, Dublin Core and semantic technology related standards (e.g. RDF). Support for some of these standards, like Dublin Core, is important, but not sufficient for solving all ECM metadata needs. Keep in mind that many standards that address taxonomies and semantic technologies are still maturing, so adopting a platform with the flexibility to support the standard in the future will be key.

The most important standard, although it is a much lower level standard than many of the others discussed in this white paper, is without a doubt XML. The Extensible Markup Language (XML) is a standard managed by the World Wide Web Consortium (W3C). The human and machine-readable text-based markup language, similar to HTML, is now familiar to most technologists. Unlike HTML, XML does not have a single defined set of tags and attributes; it allows adopters to define their own elements or utilize a vocabulary defined by another party. XML is a core technology for defining structured content and data, and of course, metadata; it is the foundation for a number of other standards like Dublin Core and XMP.

XML has been such a core technology that almost all vendors will promise support. However, like with computing, architects must examine what “support” means. Not all vendors fully support XML equally for integration and content transformation, storage and publishing. Architects should explore in detail the XML capabilities of an ECM platform when it comes to managing, storing and processing XML-based data.

Another domain that is mentioned more frequently related to metadata is semantic technology. Semantic technology allows association of meaning or context to digital content – not just meaning for people – but for computers as well. If computers can learn the meaning behind content, they can learn what users are interested in and provide assistance with common tasks, such as search or augmenting data with existing details based on known relationships. Without semantic technology, content is typically just links between structured and unstructured resources. Semantic technology provides context to these resources and their relationships so that machines can recognize entities such as people, places, events, organizations, etc. within the content.

Support for semantic technologies is limited in the majority of ECM platforms, although some forward thinking vendors are beginning to incorporate the technology. If semantic technology lives up to its promises, the enhancements it provides for metadata, categorization and content enrichment will substantially improve ECM technology. This can be seen in research and open source projects like the Interactive Knowledge Stack (IKS) project. IKS is an European Union-funded research project involving vendors like Nuxeo and Day Software, focused on building an open and flexible technology platform for semantically enhanced content management. The IKS initiative has resulted in different Open Source projects, such as the Apache Stanbol project, that provides a connection between Semantic Web data sources and traditional content management solutions. The growth of Public Open Data (as illustrated by the W3C SWEO Linking Open Data community project in figure 13) is clearly advocating for this kind of initiative, bridging traditional ECM and Semantic Web technology. The use of a modular ECM platform will no doubt make this easier!

Technology and development languages are evolving and there is a range of technical standards that are “must-have” and high value for a modern content development platform.

OpenSocial is one of them. OpenSocial was originally created as an open specification for accessing and sharing user profile, relationship and activity data across social networking sites, instead of working with the proprietary interfaces each site offered. However, its adoption has now grown beyond social networks into the enterprise, to provide a general-purpose web application integration technology. It is especially practical for creating dashboards where end users can find information from different applications in one place.

OpenSocial is comprised of two high-level concepts: gadgets and APIs. Gadgets are small, pluggable, HTML/JavaScript based components with a basic lifecycle that run in containers responsible for providing the gadget with the rendering environment and JavaScript APIs. The core OpenSocial APIs provide capabilities for managing people, activities and data and are exposed via JavaScript and REST. OpenSocial gadgets can also be used to provide a simple and light integration solution between applications, and they can access any piece of information in the enterprise that is exposed via REST.

In addition to the existing capabilities of OpenSocial, there are efforts underway to provide tighter integration between OpenSocial and CMIS; the changes are targeted for version 2.0 of the OpenSocial specification.

There are a number of additional standards not directly related to content that are important for ECM development, such as OAuth, REST and LDAP. Each of these technologies can play an important role in solution delivery.

OAuth is an open protocol standard for delegated authentication. It provides a standard way for developers to offer their services via an API without forcing their users to expose their credentials. From a user perspective, the standard allows a user (resource owner) to grant access to a protected resource from one application (service provider) to another application (service consumer). OAuth is a form of delegated authentication, which enables a single identity to be shared across multiple sites without sharing credentials. In addition to providing a standard way to grant access between applications, OAuth also provides a mechanism to restrict the scope and lifetime of a service consumer’s authentication. This is a much more secure strategy than sharing credentials and granting unlimited access to a third party. It is also convenient for users, who are freed from creating more login credentials. Prior to OAuth, there were a number of other proprietary internet authentication protocols. Unlike many of these earlier protocols, OAuth supports use by non-web based applications.

Given that enterprise content is core to many business processes, it is important that a well-designed platform provide a standard way to control access to its services. Instead of reinventing the wheel, vendors like Nuxeo are integrating OAuth in their platforms to control which services and data are shared between applications.

Lightweight Directory Access Protocol (LDAP) is another protocol standard architects should consider. The LDAP protocol allows applications to access information stored in an LDAP server. LDAP servers can store any type of information, but they are most frequently used to store contact information, security credentials and group information. The majority of organizations that support secured access to resources or email store user information in an LDAP directory. LDAP servers are so common, ECM platforms should support integration, at least at a read level, with LDAP servers so that user information does not have to be replicated in multiple locations.

Representational State Transfer (REST) is an architectural style based on how the web works, not a standard for application integration. RESTful interactions involve two components - clients and servers. Clients make stateless requests to servers; servers receive requests, process them and return a response. Requests and responses transfer representations of resources. A resource is any object at an address (URI) that can provide information or have an operation executed against it.

Given the growing popularity of RESTful style services, architects that have embraced this style of integration should carefully examine which services a platform exposes via REST. Some vendors indicate they support REST, but have very limited features exposed.

And finally, at a lower level, standards like OSGi are concretely delivering software modularity and extensibility. Technology architects who are still associating the Java technology to the heavy and hard to extend early versions of J2EE should definitely consider exploring OSGi. It provides the Java stack with a new approach to modularity and extensibility. Software like the Eclipse Equinox project, Spring Java development framework and Nuxeo’s platform extension point system are leading the way, and demonstrating the value of modularity, extensibility and component-driven software architecture.

Frequently Asked Questions

Standards provide a set of guidelines and mechanisms for interacting with a technology. Adoption of standards has a number of benefits, with the most frequently cited being interoperability.

Interoperability is one of the primary drivers for standards adoption. Interoperability takes many forms. In ECM, interoperability is primarily targeted at providing a standardized way for content-based applications to share their content assets.

CMIS is one of the most recent standards in the content management space; it was specifically designed to support interoperability among ECM solutions. Officially adopted in May 2010, managed by OASIS and supported by a large number of vendors, the standard defines a vendor agnostic domain model, abstraction and set of bindings that allows the sharing and accessing content across multiple ECM tools.

  • Repository Services
  • Navigation Services
  • Object Services
  • Discovery Services
  • Versioning Services

Metadata augments content stored by ECM softwares with additional details such as taxonomy, relationships, security attributes, usage characteristics, auditing information, and any number of additional attributes.

It is critical. Without metadata, it becomes almost impossible to manage, maintain control and find content in an ECM tool. There are a number of standards that impact metadata creation and management within ECM solutions such as XML, Dublin Core and semantic technology related standards (e.g. RDF).