Imagine a world where the software that powers your favorite apps, secures your online transactions, and keeps your digital life could be outsmarted and taken over by a cleverly disguised piece of code. This isnโt a plot from the latest cyber-thriller; itโs actually been a reality for years now. How this will change โ in a positive or negative direction โ as artificial intelligence (AI) takes on a larger role in software development is one of the big uncertainties related to this brave new world.
In an era where AI promises to revolutionize how we live and work, the conversation about its security implications cannot be sidelined. As we increasingly rely on AI for tasks ranging from mundane to mission-critical, the question is no longer just, โCan AI boost cybersecurity?โ (sure!), but also โCan AI be hacked?โ (yes!), โCan one use AI to hack?โ (of course!), and โWill AI produce secure software?โ (wellโฆ). This thought leadership article is about the latter. Cydrill (a secure coding training company) delves into the complex landscape of AI-produced vulnerabilities, with a special focus on GitHub Copilot, to underscore the imperative of secure coding practices in safeguarding our digital future.
_You can test your secure coding skills with this short _self-assessment.
AIโs leap from academic curiosity to a cornerstone of modern innovation happened rather suddenly. Its applications span a breathtaking array of fields, offering solutions that were once the stuff of science fiction. However, this rapid advancement and adoption has outpaced the development of corresponding security measures, leaving both AI systems and systems created by AI vulnerable to a variety of sophisticated attacks. Dรฉjร vu? The same things happened when software โ as such โ was taking over many fields of our livesโฆ
At the heart of many AI systems is machine learning, a technology that relies on extensive datasets to โlearnโ and make decisions. Ironically, the strength of AI โ its ability to process and generalize from vast amounts of data โ is also its Achillesโ heel. The starting point of โwhatever we find on the Internetโ may not be the perfect training data; unfortunately, the wisdom of the masses may not be sufficient in this case. Moreover, hackers, armed with the right tools and knowledge, can manipulate this data to trick AI into making erroneous decisions or taking malicious actions.
GitHub Copilot, powered by OpenAIโs Codex, stands as a testament to the potential of AI in coding. It has been designed to improve productivity by suggesting code snippets and even whole blocks of code. However, multiple studies have highlighted the dangers of fully relying on this technology. It has been demonstrated that a significant portion of code generated by Copilot can contain security flaws, including vulnerabilities to common attacks like SQL injection and buffer overflows.
The โGarbage In, Garbage Outโ (GIGO) principle is particularly relevant here. AI models, including Copilot, are trained on existing data, and just like any other Large Language Model, the bulk of this training is unsupervised. If this training data is flawed (which is very possible given that it comes from open-source projects or large Q&A sites like Stack Overflow), the output, including code suggestions, may inherit and propagate these flaws. In the early days of Copilot, a study revealed that approximately 40% of code samples produced by Copilot when asked to complete code based on samples from the CWE Top 25 were vulnerable, underscoring the GIGO principle and the need for heightened security awareness. A larger-scale study in 2023 (Is GitHubโs Copilot as bad as humans at introducing vulnerabilities in code?) had somewhat better results, but still far from good: by removing the vulnerable line of code from real-world vulnerability examples and asking Copilot to complete it, it recreated the vulnerability about 1/3 of the time and fixed the vulnerability only about 1/4 of the time. In addition, it performed very poorly on vulnerabilities related to missing input validation, producing vulnerable code every time. This highlights that generative AI is poorly equipped to deal with malicious input if โsilver bulletโ-like solutions for dealing with a vulnerability (e.g. prepared statements) are not available.
Addressing the security challenges posed by AI and tools like Copilot requires a multifaceted approach:
Navigating the integration of AI tools like GitHub Copilot into the software development process is risky and requires not only a shift in mindset but also the adoption of robust strategies and technical solutions to mitigate potential vulnerabilities. Here are some practical tips designed to help developers ensure that their use of Copilot and similar AI-driven tools enhances productivity without compromising security.
Practical Implementation: Defensive programming is always at the core of secure coding. When accepting code suggestions from Copilot, especially for functions handling user input, implement strict input validation measures. Define rules for user input, create an allowlist of allowable characters and data formats, and ensure that inputs are validated before processing. You can also ask Copilot to do this for you; sometimes it actually works well!
Practical Implementation: Copilot may suggest adding dependencies to your project, and attackers may use this to implement supply chain attacks via โpackage hallucinationโ. Before incorporating any suggested libraries, manually verify their security status by checking for known vulnerabilities in databases like the National Vulnerability Database (NVD) or accomplish a software composition analysis (SCA) with tools like OWASP Dependency-Check or npm audit for Node.js projects. These tools can automatically track and manage dependenciesโ security.
Practical Implementation: Regardless of the source of the code, be it AI-generated or hand-crafted, conduct regular code reviews and tests with security in focus. Combine approaches. Test statically (SAST) and dynamically (DAST), do Software Composition Analysis (SCA). Do manual testing and supplement it with automation. But remember to put people over tools: no tool or artificial intelligence can replace natural (human) intelligence.
Practical Implementation: First, let Copilot write your comments or debug logs โ itโs already pretty good in these. Any mistake in these wonโt affect the security of your code anyway. Then, once you are familiar with how it works, you can gradually let it generate more and more code snippets for the actual functionality.
Practical Implementation: Never just blindly accept what Copilot suggests. Remember, you are the pilot, itโs โjustโ the Copilot! You and Copilot can be a very effective team together, but itโs still you who are in charge, so you must know what the expected code is and how the outcome should look like.
Practical Implementation: Try out different things and prompts (in chat mode). Try to ask Copilot to refine the code if you are not happy with what you got. Try to understand how Copilot โthinksโ in certain situations and realize its strengths and weaknesses. Moreover, Copilot gets better with time โ so experiment continuously!
Practical Implementation: Continuously educate yourself and your team on the latest security threats and best practices. Follow security blogs, attend webinars and workshops, and participate in forums dedicated to secure coding. Knowledge is a powerful tool in identifying and mitigating potential vulnerabilities in code, AI-generated or not.
The importance of secure coding practices has never been more important as we navigate the uncharted waters of AI-generated code. Tools like GitHub Copilot present significant opportunities for growth and improvement but also particular challenges when it comes to the security of your code. Only by understanding these risks can one successfully reconcile effectiveness with security and keep our infrastructure and data protected. In this journey, Cydrill remains committed to empowering developers with the knowledge and tools needed to build a more secure digital future.
Cydrillโs blended learning journey provides training in proactive and effective secure coding for developers from Fortune 500 companies all over the world. By combining instructor-led training, e-learning, hands-on labs, and gamification, Cydrill provides a novel and effective approach to learning how to code securely.
Check out Cydrillโs secure coding courses.
Found this article interesting? This article is a contributed piece from one of our valued partners. Follow us on Twitter ๏ and LinkedIn to read more exclusive content we post.