Today I’m digging out an old question about XMP support in Nuxeo. I’m actually going to widen this a bit and talk about metadata in Nuxeo. A limitation we have these days is the fact that you can only read metadata from a file, not write them. So let’s think about what this means.
A few questions first: do you write back metadata to files when you edit them in Nuxeo? Do you sync everything you can extract from the file? How do you map this to existing document properties? When should you do all of this?
What we need to know is which metadata from which type of file should we extract and/or write back to the file?
So we can define a specific mapping between document properties and file metadata. It will be differentiated by the file mime type, the type of the document and the file’s xPath. If we don’t specify any xPath, we can still use the BlobHolder to retrieve the mail file of a document.
Another requirement we can add is to specify different mapper class. Maybe you want to extract data using Tika, maybe you want to use ExifTool, or maybe you want to use proprietary software to extract some exotic metadata.
We also need to know where to store metadata. The use case for this is quite simple. Let’s take the File document type. We know it will have different kind of files attached to it. Some will have XMP metadata, some will have EXIF, some ITPC, etc. We have to store that metadata into schemas, but we don’t want to add all of them to a specific document type. Facets are very useful for this, as we can associate a schema to a facet. So we can specify in the mapping contribution which facets are required. This way we’ll be able to add them before doing the metadata mapping.
A sample contribution would look like this:
<extension point="mapper" target="org.nuxeo.metadata.FileMetadataService"> <mapper id="defaultTikaMapper" class="org.nuxeo.metadata.TikaDefaultMapper" /> </extension> <extension point="mapping" target="org.nuxeo.metadata.FileMetadataService"> <mapping id="tikaXMP" mapper="defaultTikaMapper" nxpath="files:files/item[0]/file"> <mimeTypes> <mimeType>image/vnd.adobe.photoshop</mimeType> <mimeType>application/vnd.adobe.photoshop</mimeType> <mimeType>application/pdf</mimeType> </mimeTypes> <requirements> <facet>XMP</facet> </requirements> <properties> <propertyItem xpath="xmp:BitsPerSample" metadataName="tiff:BitsPerSample" policy="readonly" /> <propertyItem xpath="xmp:ImageWidth" metadataName="tiff:ImageWidth" policy="readonly" /> <propertyItem xpath="xmp:ImageLength" metadataName="tiff:ImageLength" policy="readonly" /> <propertyItem xpath="xmp:CreatorTool" metadataName="xmp:CreatorTool" policy="readonly" /> <propertyItem xpath="xmp:NPages" metadataName="xmpTPg:NPages" policy="readonly" /> </properties> </mapping> </extension> <extension point="docMapping" target="org.nuxeo.metadata.FileMetadataService"> <doc docType="File"> <mappingId>tikaXMP</mappingId> </doc> </extension>
Now each time you want to create a new extension point, you need two things: the XML mapping file and the service managing the XP registration and unregistration.
If you have Nuxeo IDE, there’s a wizard that generates parts of the code. It’s called Nuxeo Component. You should have an XML file and a Java class.
The Java class comes with some useful comments about service implementation:
public class FileMetadataServiceImpl extends DefaultComponent {
protected Bundle bundle;
public Bundle getBundle() { return bundle; }
/** * Component activated notification. * Called when the component is activated. All component dependencies are resolved at that moment. * Use this method to initialize the component. * <p> * The default implementation of this method is storing the Bundle owning that component in a class field. * You can use the bundle object to lookup for bundle resources: * <code>URL url = bundle.getEntry("META-INF/some.resource");</code>, load classes or to interact with OSGi framework. * <p> * Note that you must always use the Bundle to lookup for resources in the bundle. Do not use the classloader for this. * @param context the component context. Use it to get the current bundle context */ @Override public void activate(ComponentContext context) { this.bundle = context.getRuntimeContext().getBundle(); }
/** * Component deactivated notification. * Called before a component is unregistered. * Use this method to do cleanup if any and free any resources held by the component. * * @param context the component context. Use it to get the current bundle context */ @Override public void deactivate(ComponentContext context) { this.bundle = null; }
/** * Application started notification. * Called after the application started. * You can do here any initialization that requires a working application * (all resolved bundles and components are active at that moment) * * @param context the component context. Use it to get the current bundle context * @throws Exception */ @Override public void applicationStarted(ComponentContext context) throws Exception { // do nothing by default. You can remove this method if not used. }
}
<component name="org.nuxeo.metadata.FileMetadataService" version="1.0"> <implementation class="org.nuxeo.metadata.FileMetadataServiceImpl" /> </component>
The implementation tag points to the implementation of our service. We re going to add the provide tag that will point to the interface of our service managing the XP. Let’s call this interface FileMetadataService and leave it empty for the moment.
<component name="org.nuxeo.metadata.FileMetadataService" version="1.0"> <service> <provide interface="org.nuxeo.metadata.FileMetadataService" /> </service> <implementation class="org.nuxeo.metadata.FileMetadataServiceImpl" /> </component>
Now let’s write a test for that. Our goal is to make sure our service is working correctly.
package org.nuxeo.metadata.test;
import static org.junit.Assert.assertNotNull;
import org.junit.Test; import org.junit.runner.RunWith; import org.nuxeo.ecm.core.test.CoreFeature; import org.nuxeo.metadata.FileMetadataService; import org.nuxeo.runtime.api.Framework; import org.nuxeo.runtime.test.runner.Deploy; import org.nuxeo.runtime.test.runner.Features; import org.nuxeo.runtime.test.runner.FeaturesRunner;
@RunWith(FeaturesRunner.class) @Features(CoreFeature.class) @Deploy({ "nuxeo-platform-filemanager-metadata" }) public class FileMetadataServiceTest {
@Test public void testService() throws Exception { FileMetadataService serviceInterface = Framework.getService(FileMetadataService.class); assertNotNull(serviceInterface); }
}
Our test is retrieving the service using the interface name correctly. We can now move to the extension point implementation. Following the first example, We’re going to have three different points. One for the mapper class, one for the actual mappings and one to associate a document type to a mapping. As they are all different, we’re going to need at least three different XMAP descriptor.
Here’s the XML declaring the service and the three new extension points:
<?xml version="1.0"?> <component name="org.nuxeo.metadata.FileMetadataService" version="1.0">
<service> <provide interface="org.nuxeo.metadata.FileMetadataService" /> </service>
<implementation class="org.nuxeo.metadata.FileMetadataServiceImpl" />
<extension-point name="mapper"> <object class="org.nuxeo.metadata.MetadataMapperDescriptor" /> </extension-point>
<extension-point name="mapping">
<object class="org.nuxeo.metadata.MetadataMappingDescriptor" /> </extension-point>
<extension-point name="docMapping"> <object class="org.nuxeo.metadata.DocMetadataMappingDescriptor" /> </extension-point>
</component>
Here’s the mapper descriptor. It’s really simple as we just need a class and a name in case someone would want to overwrite it.
package org.nuxeo.metadata;
import org.nuxeo.common.xmap.annotation.XNode; import org.nuxeo.common.xmap.annotation.XObject;
@XObject("mapper") public class MetadataMapperDescriptor {
@XNode("@id") protected String id;
@XNode("@class") private Class<MetadataMapper> adapterClass;
public String getId() { return id; }
public MetadataMapper getMapper() throws InstantiationException, IllegalAccessException { return adapterClass.newInstance(); }
}
Here’s the mapping descriptor. That’s the complicated one :)
package org.nuxeo.metadata;
import org.nuxeo.common.xmap.annotation.XNode; import org.nuxeo.common.xmap.annotation.XNodeList; import org.nuxeo.common.xmap.annotation.XObject;
@XObject("mapping") public class MetadataMappingDescriptor {
@XNode("@id") protected String id;
@XNode("@nxpath") protected String nxpath;
@XNode("@mapper") protected String mapperId;
@XNodeList(value = "mimeTypes/mimeType", componentType = String.class, type = String[].class) protected String[] mimeTypes;
@XNodeList(value = "requirements/schema", componentType = String.class, type = String[].class) protected String[] requiredSchema;
@XNodeList(value = "requirements/facet", componentType = String.class, type = String[].class) protected String[] requiredFacets;
@XNodeList(value = "properties/propertyItem", componentType = PropertyItemDescriptor.class, type = PropertyItemDescriptor[].class ) protected PropertyItemDescriptor[] properties;
public String getId() { return id; }
public String getNxpath() { return nxpath; }
public String getMapperId() { return mapperId; }
public String[] getMimeTypes() { return mimeTypes; }
public String[] getRequiredSchema() { return requiredSchema; }
public String[] getRequiredFacets() { return requiredFacets; }
public PropertyItemDescriptor[] getProperties() { return properties; }
}
And finally the document type to mapping descriptor:
package org.nuxeo.metadata;
import org.nuxeo.common.xmap.annotation.XNode; import org.nuxeo.common.xmap.annotation.XNodeList; import org.nuxeo.common.xmap.annotation.XObject;
@XObject("doc") public class DocMetadataMappingDescriptor {
@XNode("@docType") protected String docType;
@XNodeList(value = "mappingId", type = String[].class, componentType = String.class) protected String[] mappingId;
@XNodeList(value = "mapping", type = MetadataMappingDescriptor[].class, componentType = MetadataMappingDescriptor.class) protected MetadataMappingDescriptor[] innerMapping;
public String getDocType() { return docType; }
public String[] getMappingId() { return mappingId; }
public MetadataMappingDescriptor[] getInnerMapping() { return innerMapping; }
}
Now that we have all of this, we need to add some code in the service implementation to register the XML associated with thoses descriptor. Everything starts from the registerContribution method. Depending on the extensionPoint name, we cast the contribution as one of the descriptor and give to the appropriate method registerSomething method. The most interesting method here is registerDocMapping. It’s where we start to sotre nicely the mappings, so that they’ll be easy to retrieve for one particular DocumentModel.
package org.nuxeo.metadata;
import java.io.IOException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map;
import org.nuxeo.ecm.core.api.Blob; import org.nuxeo.ecm.core.api.ClientException; import org.nuxeo.ecm.core.api.CoreSession; import org.nuxeo.ecm.core.api.DocumentModel; import org.nuxeo.ecm.core.api.blobholder.BlobHolder; import org.nuxeo.runtime.model.ComponentContext; import org.nuxeo.runtime.model.ComponentInstance; import org.nuxeo.runtime.model.DefaultComponent;
/** * @author ldoguin * @since 5.7 */ public class FileMetadataServiceImpl extends DefaultComponent implements FileMetadataService {
/** * List of Mappers */ protected Map<String, MetadataMapper> mappers;
/** * Every mapping */ protected Map<String, MetadataMappingDescriptor> mappings;
/** * Every mappings by mimeType */ protected Map<String, MetadataMappingDescriptor> mimeTypeMappings;
/** * Every mappings without specific nxpath, use blob holder instead */ protected Map<String, MetadataMappingDescriptor> bhMapping;
/** * Three level mapping registry. * First key is document type * Second key is nxPath to the blob * Third key is mime type */ protected Map<String, Map<String, Map<String, MetadataMappingDescriptor>>> nxPathMapping;
@Override public void activate(ComponentContext context) throws Exception { mappings = new HashMap<String, MetadataMappingDescriptor>(); mimeTypeMappings = new HashMap<String, MetadataMappingDescriptor>(); mappers = new HashMap<String, MetadataMapper>(); bhMapping = new HashMap<String, MetadataMappingDescriptor>(); nxPathMapping = new HashMap<String, Map<String, Map<String, MetadataMappingDescriptor>>>(); }
@Override public void registerContribution(Object contribution, String extensionPoint, ComponentInstance contributor) throws Exception { if (extensionPoint.equals("mapper")) { if (contribution instanceof MetadataMapperDescriptor) { registerMapper((MetadataMapperDescriptor) contribution); } } else if (extensionPoint.equals("mapping")) { if (contribution instanceof MetadataMappingDescriptor) { registerMapping((MetadataMappingDescriptor) contribution); } } else if (extensionPoint.equals("docMapping")) { if (contribution instanceof DocMetadataMappingDescriptor) { registerDocMapping((DocMetadataMappingDescriptor) contribution); } } }
private void registerMapping(MetadataMappingDescriptor contribution) { mappings.put(contribution.getId(), contribution); String[] mimetypes = contribution.getMimeTypes(); for (String mimeType : mimetypes) { mimeTypeMappings.put(mimeType, contribution); } }
private void registerDocMapping(DocMetadataMappingDescriptor contribution) { String docType = contribution.docType; MetadataMappingDescriptor[] innerMappings = contribution.getInnerMapping(); for (MetadataMappingDescriptor metadataMappingDescriptor : innerMappings) { addMappingToRegistries(docType, metadataMappingDescriptor); } String[] mappingIds = contribution.getMappingId(); for (String string : mappingIds) { addMappingToRegistries(docType, mappings.get(string)); } }
private void addMappingToRegistries(String docType, MetadataMappingDescriptor metadataMappingDescriptor) { String nxPath = metadataMappingDescriptor.getNxpath(); if (nxPath != null && !"".equals(nxPath)) { Map<String, Map<String, MetadataMappingDescriptor>> docNxPathMapping = nxPathMapping.get(docType); if (docNxPathMapping == null) { docNxPathMapping = new HashMap<String, Map<String, MetadataMappingDescriptor>>(); } Map<String, MetadataMappingDescriptor> mimeTypeToMapper = docNxPathMapping.get(nxPath); if (mimeTypeToMapper == null) { mimeTypeToMapper = new HashMap<String, MetadataMappingDescriptor>(); docNxPathMapping.put(nxPath, mimeTypeToMapper); } String[] mimeTypes = metadataMappingDescriptor.getMimeTypes(); for (String mimeType : mimeTypes) { mimeTypeToMapper.put(mimeType, metadataMappingDescriptor); } nxPathMapping.put(docType, docNxPathMapping); } else { String[] mimeTypes = metadataMappingDescriptor.getMimeTypes(); for (String mimeType : mimeTypes) { String id = docType + mimeType; bhMapping.put(id, metadataMappingDescriptor); } } }
private void registerMapper(MetadataMapperDescriptor contribution) throws InstantiationException, IllegalAccessException { mappers.put(contribution.id, contribution.getMapper()); }
@Override public List<MetadataMappingDescriptor> getMappings(DocumentModel doc) throws ClientException { String docType = doc.getType(); List<MetadataMappingDescriptor> mappings = new ArrayList<MetadataMappingDescriptor>(); BlobHolder bh = doc.getAdapter(BlobHolder.class); if (bh != null) { Blob blob = bh.getBlob(); if (blob != null) { String blobMimeType = blob.getMimeType(); String bhId = docType + blobMimeType; MetadataMappingDescriptor bhMapper = bhMapping.get(bhId); if (bhMapper != null) { mappings.add(bhMapper); } } } Map<String, Map<String, MetadataMappingDescriptor>> nxPathDocMapping = nxPathMapping.get(docType); if (nxPathDocMapping != null) { for (String nxPath : nxPathDocMapping.keySet()) { Blob blob = (Blob) doc.getPropertyValue(nxPath); if (blob != null) { String blobMimeType = blob.getMimeType(); MetadataMappingDescriptor bhMapper = nxPathDocMapping.get( nxPath).get(blobMimeType); if (bhMapper != null) { mappings.add(bhMapper); } } } } return mappings; }
}
Let’s also add a method called getMappings to our interface and implement it. It will retrieve the list of mapping descriptor for a specific document. That’s what we’re going to use in our previous unit test to make sure our extension points are working nicely. Let’s add some mapping examples for our test:
<component name="org.nuxeo.metadata.test.contrib">
<extension target="org.nuxeo.ecm.core.schema.TypeService" point="doctype"> <doctype name="File2" extends="File"> </doctype> </extension>
<extension point="mapper" target="org.nuxeo.metadata.FileMetadataService"> <mapper id="defaultTikaMapper" class="org.nuxeo.metadata.TikaDefaultMapper" /> <mapper id="testTikaMapper" class="org.nuxeo.metadata.test.TestTikaDefaultMapper" /> </extension>
<extension point="mapping" target="org.nuxeo.metadata.FileMetadataService"> <mapping id="tikaPDF" mapper="defaultTikaMapper"> <mimeTypes> <mimeType>application/pdf</mimeType> <mimeType>application/x-pdf</mimeType> </mimeTypes> <requirements> <schema>dublincore</schema> <facet>xmp</facet> </requirements> <properties> <propertyItem xpath="xmp:pagecount" metadataName="pagecount" policy="readonly" /> <propertyItem xpath="dc:title" metadataName="title" policy="sync" /> </properties> </mapping> <mapping id="tikaVideo" mimeType="video/mpeg" mapper="defaultTikaMapper"> <mimeTypes> <mimeType>video/quicktime</mimeType> <mimeType>video/mp4</mimeType> <mimeType>video/mpeg</mimeType> </mimeTypes> <requirements> <schema>dublincore</schema> <facet>xmp</facet> </requirements> <properties> <propertyItem xpath="dc:title" metadataName="title" policy="sync" /> </properties> </mapping> </extension>
<extension point="docMapping" target="org.nuxeo.metadata.FileMetadataService">
<doc docType="File2"> <mapping nxpath="files:files/item[0]/file" mapper="defaultTikaMapper"> <mimeTypes> <mimeType>image/png</mimeType> </mimeTypes> <requirements> <schema>dublincore</schema> <facet>xmp</facet> </requirements> <properties> <propertyItem xpath="dc:title" metadataName="title" policy="sync" /> </properties> </mapping> </doc>
<doc docType="File"> <mappingId>tikaPDF</mappingId> <mappingId>tikaVideo</mappingId> </doc> </extension>
</component>
Now we can add some code to test those mappings:
package org.nuxeo.metadata.test;
import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertNotNull;
import java.io.File; import java.io.Serializable; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map;
import org.junit.Test; import org.junit.runner.RunWith; import org.nuxeo.common.utils.FileUtils; import org.nuxeo.ecm.core.api.Blob; import org.nuxeo.ecm.core.api.CoreSession; import org.nuxeo.ecm.core.api.DocumentModel; import org.nuxeo.ecm.core.api.impl.blob.FileBlob; import org.nuxeo.ecm.core.test.CoreFeature; import org.nuxeo.metadata.FileMetadataService; import org.nuxeo.metadata.FileMetadataServiceImpl; import org.nuxeo.metadata.MetadataMapper; import org.nuxeo.metadata.MetadataMappingDescriptor; import org.nuxeo.metadata.PropertyItemDescriptor; import org.nuxeo.runtime.api.Framework; import org.nuxeo.runtime.test.runner.Deploy; import org.nuxeo.runtime.test.runner.Features; import org.nuxeo.runtime.test.runner.FeaturesRunner; import org.nuxeo.runtime.test.runner.LocalDeploy;
import com.google.inject.Inject;
@RunWith(FeaturesRunner.class) @Features(CoreFeature.class) @Deploy({ "nuxeo-platform-filemanager-metadata", "nuxeo-platform-filemanager-metadata-test" }) @LocalDeploy({ "nuxeo-platform-filemanager-metadata-test:OSGI-INF/metadata-core-contrib.xml", "nuxeo-platform-filemanager-metadata-test:OSGI-INF/test-tika-contrib.xml" }) public class FileMetadataServiceTest {
@Inject CoreSession session;
@Test public void testService() throws Exception { FileMetadataService serviceInterface = Framework.getService(FileMetadataService.class); assertNotNull(serviceInterface); DocumentModel file = createFileDocumentModelWithPdf(); List<MetadataMappingDescriptor> fileMappings = serviceInterface.getMappings(file); assertNotNull(fileMappings); assertFalse(fileMappings.isEmpty()); assertEquals(1, fileMappings.size());
DocumentModel file2 = createFileDocumentModelWithPng(); fileMappings = serviceInterface.getMappings(file2); assertNotNull(fileMappings); assertFalse(fileMappings.isEmpty()); assertEquals(1, fileMappings.size());
MetadataMappingDescriptor mapping = fileMappings.get(0); String[] facets = mapping.getRequiredFacets(); assertNotNull(facets); assertEquals(1, facets.length); assertEquals("xmp", facets[0]); String[] schemas = mapping.getRequiredSchema(); assertNotNull(schemas); assertEquals(1, schemas.length); assertEquals("dublincore", schemas[0]);
assertEquals("image/png", mapping.getMimeTypes()[0]); assertEquals("files:files/item[0]/file", mapping.getNxpath()); assertEquals("defaultTikaMapper", mapping.getMapperId()); PropertyItemDescriptor[] properties = mapping.getProperties(); assertNotNull(properties); assertEquals(1, properties.length);
PropertyItemDescriptor item = properties[0]; assertNotNull(item); assertEquals("title", item.getMetadataName()); assertEquals("dc:title", item.getXpath()); assertEquals("sync", item.getPolicy());
}
private DocumentModel createFileDocumentModelWithPdf() throws Exception { DocumentModel doc = session.createDocumentModel("/", "file", "File"); File f = FileUtils.getResourceFileFromContext("data/hello.pdf"); Blob blob = new FileBlob(f); blob.setMimeType("application/pdf"); doc.setPropertyValue("file:content", (Serializable) blob); return doc; }
private DocumentModel createFileDocumentModelWithPng() throws Exception { DocumentModel doc = session.createDocumentModel("/", "file2", "File2"); File f = FileUtils.getResourceFileFromContext("data/training.png"); Blob blob = new FileBlob(f); blob.setMimeType("image/png"); Map<String, Serializable> blobMap = new HashMap<String, Serializable>(); blobMap.put("file", (Serializable) blob); blobMap.put("filename", "training.png"); List<Map<String, Serializable>> blobs = new ArrayList<Map<String, Serializable>>(); blobs.add(blobMap); doc.setPropertyValue("files:files", (Serializable) blobs); return doc; }
}
That’s it for today. Next time I’ll show you how to actually do the mapping :)