How to Decide Between Using S3 Directly Vs Using CloudFront on Top of S3
When you store your files in the Nuxeo Platform and you want to get all the benefits from AWS cloud solutions, a time comes when you have to decide how to access the blobs stored in the Nuxeo Platform. And - it’s always a challenge - especially when everyone wants the best performance while being efficient. Finding the best solution between Amazon S3 or Azure and then having to think about using a CDN or not, is not so easy. So in this post, we’ll talk about the AWS based solutions to help you decide between using S3 directly, or CloudFront forwarding to a S3 bucket.
A while ago, I talked about how to scale your blob download rate and we saw that we can easily configure the Nuxeo Platform to generate a direct S3 signed URL, or get a URI on CloudFront object in front of S3 with a signed URL.
Let's imagine the following scenario:
I know that I have some customers at Edge locations, I know that CloudFront is the AWS CDN, and I know (for sure) that I need speed! But we have to consider that S3 is region-replicated, but it definitely does not have as many CloudFront endpoints at Edge locations available.
So, I'm not comparing the gain you'll get from fetching multiple typical web assets through a CDN, such as stylesheets, GIFs, JPEGs or .js - That is not what the Nuxeo Platform is designed for (it certainly could do so, but as you know, doing more sometimes lets us do less! We are storing big digital assets that will be fetched once at a time.)
For example, we can talk about the gain for 1 request (1 request means 1 download. Keep in mind that with an average asset 1 download is usually around 5MB, and rarely smaller sizes.)
Let's compare! Imagine the day when you, one of the best product owners in your company, will have to decide between those two configurations in the Nuxeo Platform:
- Storing blobs on S3 and configuring direct S3 access to download assets.
- Redirecting to CloudFront instead of S3, that is configured with an S3 distribution and with security as well.
Your day could look like:
- You think you want to use CloudFront in front of S3. Nice choice - You know how to speed things up from our last blog, but you need to justify this to your scrum team. As you think about it, your thoughts wander:
- You (the PO) hesitate between S3 and CloudFront+S3 during your prioritization session, as you jump from blog post to blog post that discusses performance (1 hour - if you are the Usain Bolt of blog reading… 3 hours if you are a mere mortal like the rest of us!)
- You explain the idea to the team, and, of course, the team wants to discuss it. They repeatedly ask "But why? Why CloudFront?" (The discussion goes on for at least 10 minutes.)
- Eventually the team agrees, and your senior developer moves the task into the "In progress" column on your kanban - Hoorah! - (5 seconds)
- Then the development team takes a break before starting the task (10 minutes if it's just for coffee, longer if the team requires food.)
- The developer knows exactly what he or she has to do. Yeah, they do not have to read the documentation. (0 second)
- The developer asks the AWS administrator to get him or her the CloudFront keys, because if you want secured URLs, only your AWS root account can create the key pairs needed to generate signed URL from the Nuxeo Platform. (2 minutes)
- Then the AWS Administrator, who hasn't logged in for quite sometime tires to recover the root's MFA keys in order to login, as he or she sets it up making sure to keep it secure. (20 minutes)
- Finally, the AWS Administrator generates the private keys, and shares them with the developer (asking him or her to take care of them). (45 minutes theoretically, but probably will take longer!)
- At last, our developer can create the S3 Bucket and configure it as the default blob storage in the Nuxeo Platform (30 minutes, or less - it’s pretty easy.)
- Now the developer is trying to find how to store the CloudFront key pairs, how to use them in the production environment. To do so he or she had to change deployment scripts as they must be secured and definitely not stored in the main sources repository. (2 hours, for sure!)
- And then, changing the deployment stuff breaks the CI chain! Our developer has to fix it (1 hour, perhaps more if Jenkins is being moody!)
- At last, the task is done.
Time taken: 359 minutes and 5 seconds or 21,545 seconds
On the other hand, let’s say you want to keep things simple and want to try with a bare S3 with download requests redirected directly to it
- You (the PO) prioritize the task. (30 seconds)
- You explain the task to the team, saying: "I think that it will be enough, especially since S3 has now replicated your blobs in data centers world wide, not as many locations as CloudFront, but it's good for now." (2 minutes)
- Our developer moves the task into "In progress" column. Hoorah! (5 seconds)
- The developer takes a break before starting the task. (10 minutes)
- At last, our developer can create the S3 Bucket and configure it as the default blob storage in the Nuxeo Platform (30 minutes or less - it’s pretty easy!)
- Finally, the task is done.
Time taken: 42 minutes and 35 seconds or 2,555 seconds
So, there is a difference of 18,990 seconds, roughly 316 minutes or more than 5 hours between the two setups.
S3 takes less time to configure, but the speed with CloudFront is better.
I am not doing a real benchmark now as the results will be tied to my location and other dark network's issues that I cannot control. But imagine, that we have an effective gain of 100 ms per request (that is already a pretty huge gain) with Cloudfront+S3 compared to S3.
That means you need 189,900 downloads from CloudFront before starting to gain time in configuring this specific part, including:
- Dealing with root access of your AWS Account.
- Generating CloudFront key pairs, in order to get signed URLs from the Nuxeo Platform.
- Handling sensitive files (key pairs) from development environment to production.
...instead of using S3 directly. This amount of downloads can be a lot (or not) depending on your target use and how many active users you expect.
Just keep these numbers in mind because when the moment comes to make a choice, you will save at least an hour, especially with the choice I just highlighted with this comparison.
Have fun with your AWS configuration!