Nuxeo & AWS Lambda: Create a Document from an Uploaded Blob in AWS S3


Tue 20 October 2015 By Mariana Cedica

It hasn’t even been two weeks since the AWS re:Invent conference was held and I am already looking forward to going back next year! It was quite an interesting week with many exciting new announcements from Amazon. I not only learned a lot of new things on the technical side but also had a lot of fun (read blackjack here) :)

Now, let’s focus on one of my favorite subjects from re:Invent this year, the AWS Lambda Service and the new trend, the Serverless Architecture. By “serverless” I mean no specific infrastructure, deployment, or server is required to run your code (that’s wrapped as a Lambda function) to interact with existing AWS services in response to native AWS events. It’s like magic! Not to mention the fact that AWS Lambda supports code written in JavaScript, Java and Python and that these functions are stateless. This means they can scale rapidly and many copies of the same function can be run concurrently.

Let me give you an example to show the power of this service and the cool things that you can do with this service and the Nuxeo Platform. A common deployment is a Nuxeo server running in an EC2 instance configured with Amazon S3 as the binary manager.

In this configuration, a document is created in the Nuxeo Platform and the associated blob (if any) is stored in S3. By default, if you upload a file, it’s first uploaded to the Nuxeo Platform and than the platform uploads it to S3. But by using the AWS Lambda service, you can upload your file straight to S3 and then the document referring this blob will be created automatically in the Nuxeo Platform.

The Use Case


Consider the case where you just started using the Nuxeo Platform and want to perform a mass import to upload the documents you had in your old system.

Let’s take it step by step. As this is only a proof of concept and I am focusing on the functionality, I am going to use the default configurations, no encryption in S3, no multi part upload and basic authentication with a default user to create the document in the Nuxeo Platform.

This is my current deployment (the S3BinaryManager marketplace is installed and configured):

Nuxeo and S3

Configure the bucket

By default, there are some S3 bucket events notified when objects are created, modified or removed from a bucket. One of the coolest thing about the Lambda function is that they are natively integrated with these notifications, so we can have a Lambda function configured to execute on any of these. And the only thing that this function has to do is to create the document in the Nuxeo Platform.

Basically this is what we want: A Lambda function that gets notified when an object is uploaded in our bucket and this Lambda function invokes an operation to create a document in the Nuxeo Platform pointing to this existing blob.

Lambda Function

Creating the Lambda Function


In the AWS Console we create a new Lambda function from an existing Node.js blueprint (A blueprint is a function template provided by Amazon and the existing one already provides a sample Amazon S3 object-created function). The only important thing to consider is that the function has to be in the same AWS region as the S3 bucket and the EC2 instance.

Here are the steps. Go to AWS Lambda and create a new function.


  1. In select blueprint, choose the s3-get-object template.

  2. In Configure event sources, you already have S3 preselected. Choose your bucket (in my case it’s called ‘mariana’ and select Object Created (All) as the event type.

  3. In Configure function, the runtime is already node.js. You can leave the Memory set to 128MB and increase the Timeout to 5 secs. For the IAM role, choose the existing lambda_s3_exec_role (as this roles has the necessary permissions for the AWS actions performed by the function).


That’s about it! The only thing left is to add our custom code to create the document in the Nuxeo Platform. As you can see in the existing code, some logs are already enabled. The output of these logs can be seen in CloudWatch. Just go to the Monitoring sub-tab (where you can find useful statistics, such as the invocation count, duration, etc.) and click on View logs in the CloudWatch link. This is what you should have in the end:

Create a Lambda function

Save and test Lambda function

Adding Custom Code to Create the Document in the Nuxeo Platform

Nuxeo Custom Operation:


This is the tricky part. We can’t just invoke the Create.Document or FileManager.Import operations because they both expect the file as a parameter. So we need to write a custom operation that creates the document pointing to the existing blob. The S3BinaryManager needs the digest of the file (default algorithm is MD5) and this must be one of the parameters expected by the operation, along with the title, the content type, and the length of the blob.

That means we need to pay attention and use the digest as the key of the object when uploading it to S3 and also remember to pass its filename.

Here is the code:

@Operation(id = CreateDocumentFromS3Blob.ID, category = Constants.CAT_DOCUMENT, label = "Create", description = "")
public class CreateDocumentFromS3Blob {

public static final String ID = "CreateDocumentFromS3Blob";

@Context
protected CoreSession session;

@Param(name = "filename")
protected String filename;

@Param(name = "mimeType")
protected String mimeType;

@Param(name = "digest")
protected String digest;

@Param(name = "length")
protected Long length;

@OperationMethod(collector = DocumentModelCollector.class)
public DocumentModel run(DocumentModel doc) throws Exception {
    if (filename == null) {
        filename = "Untitled";
    }
    DocumentModel newDoc = session.createDocumentModel(doc.getPathAsString(), filename, "File");
    newDoc = session.createDocument(newDoc);
    StorageBlob sb = new StorageBlob(new LazyBinary(digest, Framework.getLocalService(RepositoryManager.class).getDefaultRepositoryName(),
            (CachingBinaryManager) Framework.getLocalService(BinaryManagerService.class).getBinaryManager(                     Framework.getLocalService(RepositoryManager.class).getDefaultRepositoryName())), filename,mimeType, null, digest, length);
    newDoc.setPropertyValue("file:content", sb);
    newDoc.setPropertyValue("dc:title", filename);
    return session.saveDocument(newDoc);

}
}


The interesting part in this code is the fact that we create a LazyBinary with the given digest and we set it as the file:content property.

Adding Custom Code in the Lambda Function:


Now let’s assume we have deployed this custom operation on our Nuxeo Platform running in the EC2 instance. We need to add the custom code to invoke it from our Lambda function.

For demo purposes, we are just going to use Administrator/Administrator as the user invoking this operation and create the document in it’s personal workspace (the input of the operation is hardcoded to the ID of this document).

As mentioned above, the S3 BinaryManager expects the digest of the blob to be used as the key of the object in the bucket, so we have to upload the object using this key. As we also need the title of the document, we can use a custom S3 metadata to pass it.

The Lambda function:

var aws = require('aws-sdk');
var s3 = new aws.S3({
apiVersion : '2006-03-01'
});
var http = require('http');
var crypto = require('crypto');

var options = {
host : '52.26.252.66',
port : '8080',
method : 'POST',
path : '/nuxeo/site/automation/CreateDocumentFromS3Blob',
headers : {
'Accept' : 'application/json',
'Content-Type' : 'application/json+nxrequest'
},
auth : 'Administrator:Administrator'
};

exports.handler = function(event, context) {
console.log('Received event:', JSON.stringify(event, null, 2));

var bucket = event.Records[0].s3.bucket.name;
var key = event.Records[0].s3.object.key;

var params = {
Bucket : bucket,
Key : key
};

s3.getObject(params, function(err, data) {
    if (err) {
        console.log(err);
        var message = "Error getting object " + key + " from bucket " + bucket + ". Make sure they exist and your bucket is in the same region as this function.";
        console.log(message);
        context.fail(message);
    } else {

        //Nuxeo expects the key to be the digest of the file
        // var digest = crypto.createHash('md5').update(data.Body).digest("hex");
        var title = data.Metadata.title !== undefined ? data.Metadata.title : key;
        //console.log('title :', data.Metadata.title);

        //the input is the id of the parent document
        var postData = JSON.stringify({
        "input" : "f04453f9-de1c-4a8d-9956-add074069813",
        "params" : {
        "filename" : title,
        "mimeType" : data.ContentType,
        "digest" : key,
        "length" : data.ContentLength
        }
        });

        var req = http.request(options, function(res) {
            res.on('data', function(response) {
                console.log('Nuxeo response:' + response);
                context.succeed('succeed');
            });

            res.on('end', function(response) {
                context.succeed('end');
            });

        });
        req.write(postData);
        req.end();
    }
});

};


That’s it!

Watch it in Action:

  1. From the command line I upload my Foo Fighters ticket (amazing concert by the way! : )) in my S3 bucket by passing its title as custom metadata:
    Marianas-MacBook-Pro:opt mariana$ md5 /Users/mariana/Downloads/FooFigthers.pdf
    MD5 (/Users/mariana/Downloads/FooFigthers.pdf) = 1a29c592b09ee7725415efa354907426
    Marianas-MacBook-Pro:opt mariana$ aws s3api put-object --bucket mariana --key 1a29c592b09ee7725415efa354907426 --body /Users/mariana/Downloads/FooFigthers.pdf --content-type application/pdf --metadata title=FooFighters.pdf
    
    The answer back is:
    {
     "ETag": "\"1a29c592b09ee7725415efa354907426\""
    }
    
  2. My createDocInNuxeo Lambda function was automatically invoked:

createDocInNuxeo Lambda function was automatically invoked

  1. And we can see the document in the Nuxeo Platform:

Document seen in the Nuxeo Platform

The main file is FooFighters.pdf and I can download it to see that the file is indeed the one I uploaded.

That’s about it! You can find the source code here (a plugin contains the code of the operation and also the Lambda function code).


Category: Product & Development
Tagged: AWS, How to, Nuxeo Platform 7.x