Hierarchical Storage Management (HSM) with an Amazon S3 Bucket
We are often asked how the Nuxeo Platform handles hierarchical storage management (HSM). This is of great significance to our customers because it’s a viable way to reduce enterprise storage costs. In today’s blog, I will discuss how to setup a specific Nuxeo workspace such that all uploaded binaries are saved to an Amazon S3 bucket, while all other workspaces save their content to the local disk (default setup). Let’s take a look at the steps.
Download Amazon S3 Storage Package
First, visit the Nuxeo Marketplace to download and install the Amazon S3 Storage (SNAPSHOT) package.
Specify Your Amazon S3 Parameters
The nuxeo.conf file needs to have a few lines added relating to AWS. Replace the AWS bucket name, AWS access key, and AWS secret with your values. More information regarding the Nuxeo S3 configuration can be found here. If you need help setting things up on AWS, please take a look at the Amazon documentation.
Required Studio Configurations
We need a new schema to store a flag to indicate whether the binary should be stored in S3. For this example, I created an "s3" schema with a prefix "amazon". The schema contains one boolean field named "syncWithS3" with a default value of FALSE.
Next, we need an automation script which will set the "amazon:syncWithS3" field = TRUE. This field will be used by the S3 binary manager which I will discuss shortly.
This script does the following (in pseudo code):
- Accepts a document as input.
- Gets the parent document of the document we are uploading.
- Gets the workspace title of the parent. In this case we are only interested in the workspace titled "Amazon s3 Archive". Any file uploaded in this workspace will send the binary to S3.
- If the workspace title = "Amazon s3 Archive", set amazon:syncWithS3 = TRUE.
- Return the document so it can be saved to either S3 or local disk depending on step #4.
Now we need a new event handler. Our event handler will be listening for "about to create" events for all documents of type "File". The event handler will execute the automation script you just created in the previous step.
Lastly, we need to add a new XML extension. The XML extension is where the magic happens. We setup two separate blob providers: one for AWS and another for local disk (localEncrypted). We pass a property to the AWS blob provider that will be evaluated to determine if the document’s "amazon:syncWithS3" field = TRUE. If so, store the binary on AWS, otherwise, save it in the local disk.
This blog demonstrates how easily we can direct binary storage for a specific workspace to Amazon S3. It's just one of the many ways that the Nuxeo platform can show you the art of the possibile! Please check out our documentation center for more information.