Lucene search

Xavier René-CorailGITHUB:C82C4FE9D1A6B81D79D6EF10C4F9D007

HistorySep 21, 2023 - 8:56 p.m.

The GitHub Security Lab’s journey to disclosing 500 CVEs in open source projects

2023-09-2120:56:46

Xavier René-Corail

github.blog

10 High

CVSS3

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality Impact

HIGH

Integrity Impact

HIGH

Availability Impact

HIGH

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H

10 High

CVSS2

Access Vector

NETWORK

Access Complexity

LOW

Authentication

NONE

Confidentiality Impact

COMPLETE

Integrity Impact

COMPLETE

Availability Impact

COMPLETE

AV:N/AC:L/Au:N/C:C/I:C/A:C

0.975 High

EPSS

Percentile

100.0%

JSON

When I stepped onto the scale this morning, I remembered that there are some numbers that feel awkward to celebrate, while perhaps some others are worth celebrating! Recently, the GitHub Security Lab passed the milestone of 500 CVEs disclosed to open source projects. What’s a CVE? In short, it’s the record of a security vulnerability, under the CVE program, intended to inform impacted users. So, finding more vulnerabilities in open source shouldn’t be good news, right? Even as developer communities are getting better at keeping themselves secure, security issues may still slip through their defenses. This means that there will always be a need for security researchers, like the Security Lab, to discover and help fix them.

If you’re not familiar with the Security Lab, we’re a team of security experts who work with the broader open source community to help fix security issues in their projects, with the goal of improving the overall security posture of open source. Our core activity is to audit open source projects, not only the ones hosted on GitHub–and help their maintainers fix the vulnerabilities we find, for free. This research is foundational for our other activities, such as education, improvement of our open source static analysis rules, and tooling. And now we are celebrating more than 500 CVEs disclosed.

How did we get here?

The history of the Security Lab dates back to Semmle, the company that created CodeQL, and which was later acquired by GitHub. 2017 was a pivotal year, as we realized how powerful our product could be for finding security vulnerabilities. Unlike many other static analysis tools, CodeQL efficiently codifies insecure patterns and responds urgently to new security threats at scale. To showcase this capability, Semmle created a small security research team who used CodeQL to search for vulnerabilities in open source projects, and a web portal named LGTM.com where all open source projects could run CodeQL for free and be alerted of potential security flaws directly within their pull requests. This approach grew into an important company objective: find and fix vulnerabilities at scale in open source. This was a way of giving back to the open source community, just like any software company should.

GitHub and Semmle

In September 2019, GitHub acquired Semmle, providing an ideal home for advancing the goal of improving open source security at scale. This led to the creation of the Security Lab, with a larger team and new initiatives, including curating the GitHub Advisory Database. The GitHub Advisory Database provides developers with the most accurate information about known security issues in their open source dependencies. GitHub also incorporated CodeQL as a foundation of code scanning and a core pillar of GitHub Advanced Security (GHAS), keeping it free for open source. Code scanning reached parity with LGTM.com in 2022.

We have also expanded beyond CodeQL and now use a variety of tools in our audit activities, such as fuzzing. But CodeQL remains one of the most effective tools in our toolbox, because it enables us to conduct variant analysis at scale, and allows us to share our knowledge of insecure patterns with the community, in the form of executable CodeQL queries.

The secret? Our maintainers-first approach

Not all reports get a CVE. CVE records are useful for informing downstream consumers, so when there is no downstream consumer, there is no need for a CVE. For example, a vulnerability in a CI workflow, or a vulnerability discovered in a development branch and fixed before it reached any release does not require a CVE. While we are credited for 500 CVEs, we have actually reported and helped fix over 1,000 vulnerabilities. But who's counting, right?

That said, what matters most to us is our fix rate. When looking at the tens of thousands of reports in the GitHub Advisory Database, on average, 80% are fixed by maintainers. However, the fix rate for vulnerabilities the Security Lab reported is much higher: 96% of our reports end up with a fix. This reflects the validity of our reports and our effective collaboration with maintainers. We want project maintainers to succeed, and because of that, we are flexible on the disclosure timeline–when it’s safe for the rest of the community–we provide fix suggestions, and we always help test the new release. Our report template is open source for all security researchers who would like to use it as an inspiration for their own reports.

Now, let’s take a look at some vulnerabilities that stand out!!

Highlights from our first 500

CVE-2017-9805: Remote Code Execution vulnerability in Apache Struts

The bug that started it all. Man Yue Mo found an unsafe deserialization vulnerability in Apache Struts, which enabled an unauthenticated remote attacker to execute arbitrary code. Apache Struts was already in the news at the time, because an older vulnerability—CVE-2017-5638—had been leveraged in the Equifax breach. Mo, who at the time was still working on Semmle’s data science team, found the bug by tweaking the CodeQL query for unsafe deserialization.

> This is the starting point for me personally. I came across this without realizing its significance when looking at the unsafe deserialization sinks. This bug helped us realize the power of CodeQL and understand how it can be used to find serious vulnerabilities that are otherwise hard to find, by customizing its dataflow sources, sinks, and steps.

- Man Yue Mo, @m-y-mo

CVE-2018-4407: Kernel crash caused by out-of-bounds write in Apple's ICMP packet-handling code

By exploiting an integer overflow in the XNU kernel’s networking code, a malicious TCP packet could trigger an out-of-bounds memory access, which would instantly crash the macOS kernel (video) and reboot any Mac or iOS device on the same network as the attacker, without user interaction. It even had a tweetable poc.

> We recorded the video of the poc in our Oxford office. I modified the poc so that it could crash multiple devices simultaneously, but I made a mistake and accidentally broadcast it to the whole office, crashing all the Macs and iPhones in the office that day! People on the other floors had no idea what had happened.

- Kevin Backhouse, @kevinbackhouse

GHSL-2020-204: Remote Code Execution in Corona Warn App Server

A Remote Code Execution (RCE) vulnerability was found in the German application used to track COVID contacts. An unauthenticated attacker would have been able to able to fully compromise the server where citizens were sending their anonymous infection information to facilitate the tracking of the exposure of other German citizens.This is a good example of a vulnerability that did not require a CVE since the CWA app was only used and deployed by the German and Belgian governments.

> This was a novel vulnerability category we found at the Security Lab. I was researching how certain data validators, in theory used to make sure that untrusted data conformed to safe patterns, could actually be abused for the opposite purpose, and actually make the application vulnerable to a different type of attack. This research led to the publication of the Bean Stalking: Growing Java beans into RCE article and soon after we found many applications vulnerable to this vulnerability, including the Corona Warn App which we promptly reported to the maintainers.

- Alvaro Muñoz, @pwntester

CVE-2021-3560: Privilege escalation with polkit

polkit is a system service installed by default on many Linux distributions, including popular distributions such as RHEL and Ubuntu. A race condition vulnerability in this library enabled an unprivileged local user to get a root shell on Linux systems. The bug was in error handling code, and could be triggered by disconnecting the client too early.

> Local privilege escalation vulnerabilities on Linux are often in the kernel and require some tricky code to exploit. This bug was different because it was very easy to exploit by running a few commands in the terminal.

- Kevin Backhouse, @kevinbackhouse

CVE-2021-45046: Bypass of initial mitigations for Log4Shell

December 2021 may be remembered by Java developers and security folks for a RCE vulnerability found in the popular Log4J logging library. The Java world faltered with what was probably the worst vulnerability ever affecting the Java ecosystem. The Apache maintainers quickly published a patch for it; however, our researchers found that the fix was not sufficient and reported a bypass affecting certain OSes to the maintainers.

> Having researched and published how JNDI injections could lead to RCE back in 2016 at the BlackHat security conference, I was shocked that such a vulnerability was hidden in plain sight for so long affecting probably the most popular Java logging library. It made me realize how separated the developers and security researchers worlds actually are and how important it is to close this gap in order to build secure software.

- Alvaro Muñoz, @pwntester

Multiple script injections and “pwn request” vulnerabilities in implementations of GitHub Actions workflows

We noticed emerging insecure patterns in the implementation of GitHub Actions and helped fix more than a hundred instances in open source projects. We also published guidelines and CodeQL queries to find these types of vulnerabilities, and an open source tool that helps users set the right permissions for the tokens used in these pipelines to limit the damage in case of an exploit. Since the vulnerabilities were in the implementation of CI/CD pipelines the reports didn’t get CVEs assigned as no immediate action was needed by the open source projects’ users once they were fixed.

> One pattern, which we coined as ‘pwn request’ was especially interesting because it was a combination of two unrelated features. When used together it led to a vulnerability.

- Jaroslav Lobačevski, @jarlob

CVE-2022-20186: Privilege escalation in Arm Mali GPU

This one is a vulnerability in the Arm Mali GPU kernel driver that can be used to gain arbitrary kernel memory access from an untrusted app on a Pixel 6, to eventually gain root privileges and disable SELinux.

> This bug somewhat kicked off a series of powerful bugs that exploited the memory management code in the Arm Mali GPU, which provided a very reliable and simple way to exploit the kernel, despite all the mitigations that were introduced in recent years.

- Man Yue Mo, @m-y-mo

The road to the next 500 CVEs

With the continuous improvements of CodeQL, and the ongoing modeling of new frameworks, turbo charged by the use of Large Language Models (LLMs), we are disclosing vulnerabilities faster and at a larger scale than ever before. It won’t be long until we write again to celebrate the next 500 CVEs.

Our dream, however, is to reach a point where the impact of the education and protection efforts–from us and the community at large–will balance this audit and disclosure activity, and result in finding less vulnerabilities in open source code. For example, because CodeQL is available for all projects via code scanning, any improvement will help us find more issues, but on the other hand an increased use of code scanning will prevent these issues from happening in the first place.

But we cannot do that alone. We need all of you.

Assemble! Securing open source is a team effort

With CodeQL and multi-repository variant analysis, you can multiply your audit’s impact by coding an insecure pattern and finding all occurrences in your code portfolio–we know that bugs are often copy/pasted throughout projects. You can also multiply your impact by contributing your CodeQL queries back to the open source repository, and sharing them with the community, to find and fix even more occurrences, and protect many projects–as well as the open source software supply chain.

If you maintain an open source project you can enable code scanning and Dependabot for free to immediately benefit from this security knowledge as a first line of defense. I encourage you to also enable private vulnerability reporting so that teams like the Security Lab, who audit open source projects, can report issues to you privately to collaborate on a fix.