Optimizing image conversion is a common challenge faced by global media and news publishing organizations. Over the past few weeks, we have been working on a use case from one such organization, where a key capability expected from the Nuxeo Platform was the ability to import up to 15,000 high resolution jpeg images per hour with near real time indexing and conversions such that the assets are searchable and previewable almost instantly after the ingestion. While there is no doubt that the Nuxeo Platform can handle this (thanks to its hyper scalable architecture), the real question was with what minimum infrastructure could this be achieved.
The Nuxeo Platform integrates Imagemagick to handle image conversion, which is an incredibly versatile tool for image transformation. Here we will focus on the resize feature. When an image is uploaded in the platform, several conversions are performed to get different resolutions of the original image: thumbnail, preview, etc. The default IM command used by the platform is very generic so it works in virtually any situation. It’s good when you start with the platform but you can optimize it for your specific use case once you are familiar with the platform.
Overriding the default converter is as easy as redefining it in Nuxeo Studio. Below is an example of a less CPU intensive command than the default one:
We ran tests with both the default converter and the optimized one on a c3.xlarge (4CPU/8GB RAM) instance on AWS using a set of 150 images with an average size of 5MB and a resolution of 18MP. Here is the result: Default vs. Optimized JPEG resize converter
With just a few lines of configuration we already got a 100% increase in throughput which is pretty awesome. But there is more to come. Imagemagick offers many different algorithms to resize images. Let’s consider 3 of these from the fastest to the slowest: sample, scale, and resize.
The above figure shows that the throughput doubles when switching from resize to scale and even more with sample. However all this makes sense only if the output images look good enough to be usable.
Clearly, sample does not give very good results. On the other hand, it’s hard to tell the difference between scale and resize, which is great because if we can get 4x the default throughput for free it’s a pretty good deal!
Finally, you can get another small increase in throughput by using Graphicsmagick instead of ImageMagick. It’s not as versatile (no Photoshop file support) but it’s slightly faster with jpeg images.
In the end it really depends on the use case and what conversions are required. Keep in mind that you can use a different command for each conversion, which leaves a lot of room for further optimization.