The Use Case
Some companies protect their internal network with a “DMZ”. DMZ is a network area between two firewalls with rules to control all requests sent to servers listening in the most internal area. This allows to protect the internal network from external attacks while authorizing some external users or systems to send requests to internal applications. In our specific case, we want users both from the internal network and the Internet to access or share the same documents stored in Nuxeo repositories. There are several architectures that provide solutions for this case. Let’s take a look at them one by one.
The Reverse Proxy
The reverse proxy server allows to transfer requests that come from internet users to servers in the internal zone, and return the response to the user. It is based on a whitelist logic: everything is blocked except some specific request patterns. You just have to set up a module that defines which URLs in the back office can be accessed, and let the people have access to the web app or send requests to the platform using the REST API. An easy solution is to rewrite all URLs that follow the pattern "hostname/nuxeo/*”, but it is possible to implement a much finer filtering logic by choosing one of the following:
- The Nuxeo REST API URLs: /api/v1/*
- The login URLs
- The binary download URLs
- A few other urls, such as renditions, etc.
- Sometimes, the CMIS URLs
Note: The Nuxeo Technical Support team can provide you with an exact list depending on your use case.
This solution is the most simple - it has very limited setup and configuration steps. You can use the default Nuxeo UIs on the Internet, everybody shares the same data referential in real time, and there are no additional servers (reverse proxy is just a module on the Web server).
Instead of a simple web reverse proxy (Apache module for instance), it is possible to set up an API Gateway solution. API gateways are dedicated softwares for proxifying API calls with many services around: additional security, dynamical API mapping, traceability, routing rules, etc. This is a more advanced solution that may be interesting mostly in case of a global approach among several applications as it adds more complexity and cost to the global setup.
Application in the Middle and API Breakage
Some customers prefer to add an application server in the DMZ, typically a node.js one or a Java one, with custom URLs for exposing a minimal set of features remotely (for instance, basic CRUD and file upload).
That way it is possible to open requests between the Node.js server and Nuxeo internal nodes, and there is a rupture of API between the two of them.
The advantages of this architecture are:
- API Protocol rupture.
- Customer is fully responsible of the APIs that are exposed.
There are, however, some drawbacks:
- More custom developments (although with Node, Java, or Python Nuxeo clients integration can be implemented quickly)
- The fully customized code is exposed and becomes the potential failure. Generic code, such as the Nuxeo code, is fully tested and it has been through tens of security audits with customers all over the world, while custom code usually barely goes through one security review.
The Hybrid Model
It is possible to have two Nuxeo repositories to define some content syncing rules, implemented using REST API calls.
The advantages are:
- Clear data separation
And the drawbacks are:
- Syncing happens only in one way. That’s interesting for publishing use cases, but less for collaborative ones. A two-way syncing logic could be implemented, but there will always be the limit of related resources that would not be synced (comments, linked workflows, etc.)
- There are two Nuxeo setups, so that would mean more VMs, etc.
With the first three solutions, internal requests and external ones don’t go through the same reverse proxies. We can leverage this to add a specific header in the requests, with the values “internal” or “external”. Then, at principle creation time in the Nuxeo Platform we can automatically assign the user to virtual groups - “internal” and “external”. Users with management permission can define Access Control Lists specifically for those groups on some specific folders, so that they are only accessible to internal users. It is also possible to control the access at document level based on a “restricted” boolean metadata with an additional security policy. That way, not only you are protecting your network from external attacks while continuing to collaborate, but you also make sure that critical documents will never be downloaded out of your corporate network!