Google Cloud Vision API was released, as beta, last February 18. And Nuxeo released Nuxeo Vision a month later(1), via a plugin that Michaël created. The plugin, once installed, adds tags to any Picture or Video document.
Here is an example of auto-tagging a photo of a very cute little dog(2):
This photo is safe for searching
The plugin’s documentation page already explains quite a lot, from nuxeo.conf setup, to overriding default chains and the known limitations of Google Vision API (such as, the maximum size for an image is 4MB as of today).
Here’s a quick reminder about two points:
- Tagging is done asynchronously. So, once a picture/video is uploaded, you may have a little delay before being able to see the tags.
- About tagging videos: The plugin sends the pictures of the storyboard for tagging, (because Vision API handles only pictures)
That being said, in this article I’ll focus on the core operation provided by the plugin: VisionOp
. This is the one that gives you the full control over the Google Vision API. It wraps your call and returns the result. You can do quite a lot with it, and go way beyond the default tagging behavior.
There are a couple of things to understand about this operation:
It accepts one single Blob or a list of Blobs as input, and returns them unchanged, they are just sent to Google Vision for processing (beware of the maximum size that can be sent. See the documentation).
It also accepts parameters, the first one (
features
) is a list of features telling Google what kind of information you want. In this article, we will use the SafeSearch feature.It outputs the result of the processing in a Context Variable (whose name is passed as a parameter), which is an array of
VisionResponse
, described here. Because of the structure of the response, using JavaScript automation looks like a very good idea to handle the result. It is easier to handle loops, conditions etc.
So, what kind of features can you use? The list is available here, and you can see why I said you can go far beyond tagging a picture. Let’s start with safe searching today, and we’ll write other blogs on other features. Hopefully.
Safe Searching
Let’s say, you want to make sure an image is safe for searching and does not contain inappropriate content. “Inappropriate content” is mainly understood as the likelihood that an image might contain violence or nudity, but the API can also tell you about spoof (“obvious modification was made to the image’s canonical version to make it appear funny or offensive”) or medical content.
So, you call VisionOp
with the ["SAFE_SEARCH_DETECTION"]
feature. And this example is even more interesting(3): The result returned by the API is not (yet?) handled by Nuxeo Vision plugin, there is nothing in the operation that gives you direct access to the result like we have for the tags for example, but the VisionResponse
object has a getNativeObject()
accessor which is perfect for us: We are going to use it right away.
In the case of a safe search, the returned object is a SafeSearchAnnotation, and the documentation tells us that its structure will be put in the safeSearchAnnotation
field of the result. It has four fields: adult
, spoof
, medical
, and violence
. For each field, the value will be “VERY_UNLIKELY”, “UNLIKELY”, “POSSIBLE”, “LIKELY”, “VERY_LIKELY” or “UNKNOWN” (yes: Sometimes, Google doesn’t know. So disappointing).
So, say we have a SafeSearch
schema, with sc:adult_value
, sc:violent_value
fields (string, where we will store the results) and a sc:for_adult
boolean field calculated. In our example, we are very strict: any value not equal to “VERY_UNLIKELY” is considered for adult. A real life application will be smarter, and maybe start an approval workflow when the value is “LIKELY” or “UNKNOWN” for example.
Here is the result…
…and our script:
// No error handling (check blob is null, check result is not null, ...) to keep the example simple
function run(input, params) {
var blob, nativeResult, safeSearchResult;
// In this example, let's get the medium view if the original size is > 4MB
blob = input['file:content'];
if (blob.getLength() > 4194304) {
blob = Picture.GetView(input, {
viewName: 'Medium'
});
}
// Call the main operation
blob = VisionOp(blob, {
features: ['SAFE_SEARCH_DETECTION'],
maxResults: 5,
outputVariable: 'results'
});
// We always get a _list_ of VisionResponse. For one single blob and feature,
// we will have one response. Let's get the nativeObject, which already is JSON
nativeResult = ctx.results.get(0).getNativeObject();
// Now we have the Google response. Get the value in
// the correct field, as documented
safeSearchResult = nativeResult.safeSearchAnnotation;
// Store values
input['sc:adult_value'] = safeSearchResult.adult;
input['sc:violent_value'] = safeSearchResult.violence;
// Calculate the boolean.
// In this example ve are very strict. We allow only "VERY_UNLIKELY"
// Any other value is considered "for adult".
input['sc:for_adult'] =
safeSearchResult.adult !== 'VERY_UNLIKELY' ||
safeSearchResult.violence !== 'VERY_UNLIKELY';
input = Document.Save(input, {});
return input;
}
(1) Yes: One little month. Which is quite fast, but that’s just how we are.
(2) Isn’t my dog just cuuuuute? And he is in a stroller. Which is a kind of vehicle, so Google Vision API is right.
(3) It was already very, very interesting. How can it be even more interesting?