You might have read about some of the recent data leaks which came from unsecured ElasticSearch servers directly connected on the internet.
This is why you get a thorough firewall review In security norms & regulations like PCI/DSS. In a public Cloud, updating security groups while testing and then forgetting about it, can happen really fast. Separation of duties is a way to prevent this from happening, but as you probably know, problems usually don’t have a unique cause.
The Nuxeo security team motto is: “automation, automation, automation”. And so, it seems obvious for us to face this new security challenge in an automated way. In this blog, I’m going to cover how we automated our firewall reviews and what we think about doing next in terms of automated security.
We already process all cloudtrail events in real time, so we added a token at the end of every security group rule that is a proof of acceptance from the security team. In our production environment, we enforce these tokens; any rule that does not contain or match a token authorizing the rule is removed by our automation.
A security group rule has several attributes: Protocol, Type, Target, Port, Description.
We decided to use the Description to add the token so if the rule is:
|Rule with Token||INGRESS||TCP||188.8.131.52||22||NYC Office$TK123|
|Token (TK123)||INGRESS||TCP||184.108.40.206||22||NYC Office|
As we are deploying each customer in its own separate VPC, we came up with regex to validate the full deployment:
|Rule with Token||INGRESS||TCP||My-core-app-sg||22||NYC Office$TK124|
|Token (TK124)||INGRESS||TCP||.*-core-app-sg||22||NYC Office|
We use the regex to allow as target all the core-app-sg.
The tokens also have regex to validate on which security group they are applied at.
In our custom Security Operations Center (SOC), people can request new tokens and once the request is validated by a Security team member, the token is delivered.
We have then the capabilities to see where a token is used within our AWS environments.
Those tokens are digitally signed to make them impossible to alter from database (DynamoDB), here is an scraped export of one:
So whenever a SecurityGroup is modified, we do analyze each rule to verify they contain a token, if they don’t,we delete the rule to avoid any invalidated traffic. Also, as soon as we revoke an approval, it will remove all matching rules within the environments.
One of the goals of our team is to simplify as much as we can those processes, so if we identify a rule that matches a token but doesn’t contain one, we will add it on behalf of the user. We can then let them know that the process wasn’t respected, but work can continue seamlessly.
The process has been in place for almost 2 years now. We’re very happy with its implementation and the requirements are better understood by our teams.
But I have to admit, I was also responsible for an outage of an hour when we deployed the first time. Let me tell you what happened: in one of the first deployments, a bug sneaked in, letting the automation think the tokens were not legitimate, which led to a removal of all security group rules. We quickly identified the issues and our production team recovered within an hour. I could try to pretend it was some chaos engineering to ensure the disaster recovery process was working well but I’m too honest for that. We learn from our mistakes.
What’s next? We will soon be adding some DNS resolutions within token to allow some rules to include DNS name, so our system will help our teams to allow our office provider inside their system.We will also backport it as an IvoryShield module, let us know if you want to know more about it.
I hope this helped you understand how we manage our firewall review and I hope to share more of our security processes with you in the future, to continue showing you the Nuxeo Platform is best suited for organizations who have deep and complex regulations environments.