Last December I was lucky enough to go back to reInvent and I had an amazing time - five days of interesting talks, sessions, games, and of course partying (after all we were in Vegas!). It’s very difficult to choose the hottest topic of the event. There were plenty of new things to discover and learn including machine learning, IoT, big data, containers, security, database, and serverless.
Ever since I discovered my love for AWS Lambda last year, I made it a point to attend some of the hands-on labs on the subject. The range of applications powered by Lambda is endless, but in a session that I followed the subject was one of the classics: Automatically create thumbnails when an image is added to an AWS S3 bucket.
I say “a classic” because it is one of the examples in the official documentation and the sources are on github on Amazon Web Services - Labs (If you didn’t know about these labs, you should definitely take a look!)
This subject gave me a cool idea: Since Lambda already knows how to use ImageMagick to perform simple image processing operations, I can integrate this with the Nuxeo Content Services Platform and have all the thumbnails generated using Lambda.
The Use Case
Let’s say you are running a Nuxeo Digital Asset Management application with all the binaries stored in S3 and you might need to do image processing every time you upload a new picture. Since the blobs are already in S3, you want to delegate this processing to a Lambda function and store the result back in the same bucket.
Learn more about AWS for Digital Asset Management
The Design
I will take the example of the thumbnails generation, but it can be easily modified to fit your needs. Basically, what we need is the following: every time the Nuxeo Platform stores an image in the bucket, a Lambda function is invoked to generate the thumbnail, put it back in the same bucket, and then update the existing document in the Nuxeo Platform with the info. Of course, since the thumbnail is already in the bucket we won’t need to re-upload it but just point to it.
There is an obvious problem with the above. If the Lambda function is invoked by a s3:ObjectCreated:Put and then the thumbnails are added back to the same bucket how do we avoid triggering the thumbnails generation in an infinite loop?
Here’s the solution. Amazon S3 supports having folders within a bucket to group objects and Lambda functions can be configured to be triggered by events on specific folders. So, in order to avoid the infinite loop we can have two folders in the bucket: one to store the original images and the other to put the generated thumbnails in. But, of course, we have to make sure that the Nuxeo Platform is accessing the generated thumbnails.
The Nuxeo Platform stores the binaries in S3 using their digest as the key (name) and can be configured to use a given folder of the bucket (using the property nuxeo.s3storage.bucket_prefix). With these premises we need to move the generated thumbnails into the folder used by the Nuxeo Platform. Let’s called it /nuxeo
So we need to create:
1. A Lambda function that generates the thumbnails into a new folder (/thumbnails) and invoked every time a new object is added in the /nuxeo folder. This function will be invoked on the s3:ObjectCreated:Put
event.
2. A Lambda function that moves the new files from the /thumbnails folder to /nuxeo and calls an operation in the Nuxeo Platform to update the info with the newly generated thumbnail into the original document once the move is successful. Since the creation of the object by copy triggers a s3:ObjectCreated:Copy
event, it won’t notify the first function to regenerate the thumbnails from the thumbnails.
The Implementation
You can find the source code of this example (containing both Lambda functions and the Nuxeo operation) on github. I won’t focus on the configuration part, but keep in mind that you need to grant necessary permissions to read and write from the bucket to the user executing the Lambda functions. In order to do that, you can attach the same security policy required by the Nuxeo Platform to access the bucket (as explained in the AWS configuration doc).
Let’s break down the implementation now (assuming that my Nuxeo server is already configured to store the binaries in the bucket test-picture-views-with-lambda , folder /nuxeo.):
1. First, we need to disable the original thumbnail generation in the Nuxeo Platform
This can be done by a simple contribution to disable the listeners that originally generate the thumbnails:
<listener name="updateThumbListener" enabled="false"/>
<listener name="checkBlobUpdate" enabled="false" />
2. Create the generateThumbnails Lambda function
You can start from scratch and create the Lambda function by selecting the blueprint “image-processing-service” or you can just upload the code from generateThumbnails (don’t forget to zip the folder and upload all the existing node-modules as it uses ImageMagick related libs). Configure it to be invoked when a new object is created by the Nuxeo Platform:
This is the interesting part in the function:
var digest = crypto.createHash('md5').update(data).digest('hex');
var dstKey = 'thumbnails/' + digest;
s3.putObject({
Bucket: dstBucket,
Key: dstKey,
Body: data,
ContentType: contentType,
Metadata: {
originalFileDigest:originalFileDigest
}
},
next);
This stores the new generated thumbnail using its digest as a key and saves the digest of the original image so that later we can find the document in the Nuxeo Platform to update its thumbnails info.
3. Create the moveThumbnails Lambda function
Update the code from moveThumbnails.js: This function moves (by copying) the thumbnail back to the /nuxeo folder and after that invokes an operation in the Nuxeo Platform to update the info on the document.
var thumbnailDigest = srcKey.substring('thumbnails'.length + 1, srcKey.length);
var newKey = 'nuxeo/' + thumbnailDigest;
//Copy - Pasting the object won't trigger a 'putObject' event
s3.copyObject({
Bucket: dstBucket,
Key: newKey,
CopySource: srcBucket + '/' + srcKey
}, (err, data) => {
if (err) {
console.log('Error copying file:' + err);
} else {
//get the object to read the original file digest stored in metadata
s3.getObject({
Bucket: dstBucket,
Key: newKey
}, function(err, data) {
if (err) console.log(err, err.stack);
else
updateThumbnails(data.Metadata['originalfiledigest'], thumbnailDigest);
});
}
});
4. Create the SetThumbnail.java operation in the Nuxeo Platform You can build the nuxeo-thumbnails-with-lambda jar and deploy it on the server. It contains the operation and the contribution to disable the original thumbnail generation in the Nuxeo Platform. Since the S3 Binary Manager expects the digest of the blob to be used as the key of the object in the bucket, we need to pass both the digest of the original image (to find the document in the Nuxeo Platform) and the the digest of the thumbnail. Of course, we won’t re-upload the blob since the file is already in the bucket. We will just tell the Nuxeo Platform to create a new blob using this digest and set it as the thumbnail.
As you can see from the code above, we already have both since we set the digest of the original image as an S3 custom metadata on the thumbnail generated.
DocumentModelList docs = session.query(String.format("Select * from Document where content/data = '%s' ",originalFileDigest))
for (DocumentModel doc : docs) {
doc.addFacet("Thumbnail");
BinaryBlob sb = new BinaryBlob(new LazyBinary(thumbnailDigest, "default", getCachingBinaryManager()),
thumbnailDigest, (String) doc.getPropertyValue("file:content/name"), "image/png", null,
thumbnailDigest, -1);
doc.setPropertyValue("thumb:thumbnail", sb);
session.saveDocument(doc);
}
That’s it. Now let’s see it in action.
See it in Action!
Just upload an image in the Nuxeo Platform:
You will get the thumbnails automatically!
Here is the proof that my Lambda functions did all the work (Of course, you should try them for yourself even if you believe me! ):