Limiting the Software Supply Chain Attack Surface

2022-09-2100:00:00

www.trellix.com

7.6 High

AI Score

Confidence

High

JSON

Limiting the Software Supply Chain Attack Surface

By Trellix · September 21, 2022
This blog was written by Douglas McKee

We often discuss how the intentions of an action matter, and it’s clear to see why they do. If I am walking down the sidewalk, distracted by my phone (of course) and run into a streetlamp, I may get hurt but there was no malicious intent or action which took place. In contrast, if I am performing the same action, even still distracted, but someone pushes me into a streetlamp which results in me getting hurt, we are quick to condemn the assailant. But take note the result was the same. I was hurt and let’s argue to the same degree.

When a threat actor maliciously modifies source code to execute a software supply chain attack like the infamous SolarWinds or Codecov events, we immediately recognize the impact and malicious intent; but what about a vulnerability or flaw in a core framework or library that goes unpatched for 15 or more years? There is no malicious intent. It was not planted by a threat actor. Yet it ends up in software supply chains that can affect potentially even more users than a planted issue. The best part in some cases is attackers know about it, and defenders are blind to it. Today Trellix releases research, not into a new vulnerability, but a very old one. CVE-2007-4559 was reported back in 2007 to the Python project indicating that the tarfile module was not properly checking for path traversal vulnerabilities. Today, left unchecked, this vulnerability has been unintentionally added to hundreds of thousands of open- and closed-source projects worldwide, creating a substantial software supply chain attack surface.

Our research

While investigating an unrelated vulnerability our team stumbled across this issue present in an enterprise device. Initially we thought we had found a new zero-day vulnerability. As we dug into the issue, we realized this was actually CVE-2007-4559. While the vulnerability was originally only marked as a 6.8, we were able to confirm that in most cases an attacker can gain code execution from an arbitrary file write. For a detailed technical understanding of the CVE and the technical consequences of an attack please read our blog written by Kasimir Schulz.

It is an important part of our validation process to understand past research done in an area we are working on. When we started pulling on the proverbial thread, we couldn’t believe what unraveled. With standard public access to GitHub we were able to find over 300,000 files that contained Python’s tarfile module and an average of 61% were vulnerable to an attack as result of CVE-2007-4559 in 2022. This led us to contact GitHub to see if we could obtain a more comprehensive understanding of the footprint this 15-year-old vulnerability had. With GitHub’s cooperation we were able to determine there were around 2.87 million open-source files which contained Python’s tarfile module in about 588,000 unique repositories. Due to the large dataset, we are currently processing the results, however 61% of instances being vulnerable is currently holding allowing us to estimate that over 350,000 unique open-source repositories will be vulnerable to this attack. This open-source code base spans a vast number of industries. An overview of these industries can be seen in the chart below and we expect it would be even wider if data was available for all software.

Figure 1: Affected Industries for open-source projects

For a more complete and in-depth understanding of our data collection and analysis of open-source projects please read our lead OSINT expert Charles McFarland’s blog.

How did we get here?

Let’s start by being explicitly clear – there is no one party, organization or person to blame for the current state of CVE-2007-4559, but here we are anyway. We need to start by considering that open-source projects like the Python project are run and maintained often by a group of volunteers. In this case Python is run and owned by the Python Software Foundation (PSF) which is a non-profit organization. It is often harder for these types of groups to obtain resources, perform vigorous reviews, make unilateral decisions, and track and therefore fix these types of issues in a timely manner.

In cases like this, there is often also a debate on if there are legitimate use cases for the behavior of a module. We have seen the argument, including in this case, just because an aspect of a function could be used for a malicious purpose does it mean its ultimately needs to be removed? To reference my opening example, should we remove the streetlamp because some could push you into it? In this instance I believe the risk outweighs the reward for accommodating a few corner cases. We can see in similar modules like the Python Zipfile module where they protect against this type of vulnerability without concern for possible legitimate use.

Lastly, it is not uncommon for libraries or software development kits (SDK) to consider the responsibility for securely leveraging their APIs as part of the developer’s responsibility. In our case, Python put a clear warning in the documentation about the risks of using this function.

Figure 2: Warning to developers related to CVE-2007-4559

While the warning is a positive step towards awareness of the issue, it does not help prevent the vulnerability from being perpetuated further. To add fuel to the fire, many developers depend on third party tutorials for learning how to use modules. Most tutorials reviewed in our research incorrectly demonstrate the insecure use of the tarfile module. This includes Python’s own documentation; popular sites like tutorialspoint, geeksforgeeks, askpython.com and many more all provide examples of how to use the module, but don’t use it securely. This allows the vulnerability to continue to be programmed into the supply chain for years to come.

Call to action

While we often focus on supply chain attacks that deal with threat actors infecting updates or code repositories due to a breach, we don’t always emphasize that a vulnerability in an underlying library can have the same impact if not worse. Our research is limited by the access we have to code which is why we have such a heavy focus on GitHub and open-source projects. However, recall this project started while looking at an enterprise product for zero-day vulnerabilities; a product that is suspectable to an attack as a result of the vulnerabilities it inherited from the software supply chain. While we can’t provide as detailed an analysis as we can with open-source projects, it is fair to expect the trend to be similar. What if 61% of all projects – open- and closed- source – could be exploited due to this vulnerability?

For your organization and the details specifically around CVE-2007-4559, perhaps this is not terrifying, but supply chain attacks on average are costing organizations 4.46 million dollars and take over 235 days to identify. As an industry we cannot afford to ignore the need to seek out and eradicate these types of foundational vulnerabilities. To do our part Trellix is releasing a script which can be used to scan one or multiple code repositories looking for the presence and likelihood of exploitation for CVE-2007-4559. Additionally, we are working on automating submissions of pull requests to open-source projects which can be confirmed to be exploitable.

_ This document and the information contained herein describes computer security research for educational purposes only and the convenience of Trellix customers. _

_ This document and the information contained herein describes computer security research for educational purposes only and the convenience of Trellix customers. Trellix conducts research in accordance with its Vulnerability Reasonable Disclosure Policy. Any attempt to recreate part or all of the activities described is solely at the user’s risk, and neither Trellix nor its affiliates will bear any responsibility or liability. _