As if the problem of using open source components with known vulnerabilities in production systems is not enough…
More recently, open source consumers are also haunted by open-source software supply chain attacks, that consist of the injection of malicious code into an open-source project such that it is downloaded and executed by downstream consumers.
Such attacks became more and more prevalent over the course of the last few years, with new “typosquatting” and “dependency confusion” attacks being reported almost on a daily basis. The presentation and discussion of those events, however, often uses a terminology specific to the affected ecosystem or programming language. And while digging deep into the respective attack, e.g., the characteristics of injected code or second stage payload, those sources generally do not position the attack in regards to the overall attack surface of today’s software development practices.
A taxonomy to the rescue
Early 2021, a taxonomy of open-source supply chain attacks was developed by Piergiorgio Ladisa (PhD student at SAP Security Research) and his supervisors Henrik Plate, Matias Sebastian Martinez, and Olivier Barais. The taxonomy provides a technology-agnostic overview of known attack vectors at the disposal of attackers to inject malicious code into upstream open source projects.
To facilitate the consumption of the taxonomy and the rich data set coming with the systematization of knowledge, the same authors developed the Risk Explorer, which is a tool that supports the interactive visualization and exploration of this wealth of information. This tool is open-sourced and hosted by SAP since the beginning of 2022.
Endor Labs’ mission is to secure the software supply chains of its customers. To this end, the taxonomy is an invaluable resource to understand, investigate and communicate on the overall threat landscape, as well as to evaluate existing and future safeguards. It is for those reasons that Endor Labs decided to collaborate on the future development of this open-source project, and to host a custom version of the Risk Explorer.
Taxonomy - Overview and Structure
Continuing the precursory work of Backstabber’s Knife Collection, the taxonomy takes the form of an attack tree, where the root node corresponds to the top-level goal of the attacker, and subnodes represent subgoals to reach the respective parent goal. In this context, the attacker's top-level goal is to “Conduct an Open Source Supply Chain Attack”, thus, to place malicious code in open-source artifacts such that it is executed by downstream projects, e.g., during development or runtime.
Overall, the taxonomy has more than 100 different nodes, created on the basis of a thorough literature review and hundreds of real-world incidents. While the main structure is outlined below, please refer to SoK: Taxonomy of Attacks on Open-Source Software Supply Chains for details on the methodology, the surveys used to validate the taxonomy’s completeness and comprehensibility, and developer feedback on the cost/benefit ratio of different safeguards (which definitely deserves a dedicated post). As the first author of the paper, Piergiorgio will have the pleasure to present it in May 2023 at the 44th IEEE Symposium on Security and Privacy in San Francisco.
The criterion used for creating the tree’s first-level child nodes is the attacker’s level of interaction with existing, legitimate project resources:
- “Develop and Advertise Distinct Malicious Package from Scratch” does not interfere with existing projects at all. Instead, the attacker spends some effort to create an open source project with seemingly useful functionality from the ground up, with the goal to use it for spreading malicious code.
- “Subvert Legitimate Package” subsumes different techniques to compromise the resources of an existing project, e.g., a project’s source code management system or build server.
- “Create Name Confusion with Legitimate Package” is positioned between those two. Even though name confusion techniques do not compromise any existing project resource, they exploit weaknesses of general consumer practices and expectations, sometimes resulting from past interaction with or knowledge about existing projects. Typosquatting, brandjacking and comparable techniques are part of this vector.
The second level below “Subvert Legitimate Package” is structured according to the type of compromised resource:
- “Inject into Sources of Legitimate Package” summarizes techniques to tamper with the source code managed by the project’s versioning control system like Git.
- “Inject during the Build of Legitimate Package” relates to a project’s build environment used for creating ready-made project artifacts for easy consumption.
- “Distribute Malicious Version of Legitimate Package” subsumes attacks on the distribution infrastructure, e.g., binary repositories hosting such ready-made project artifacts.
Concrete, actionable techniques used by attackers are represented by the leaf nodes of the attack tree. In the following example, “Exploit Unicode Bidirectional Algorithm” refers to the attack outlined in the Trojan Source paper, which is represented by a corresponding node after the respective discovery in autumn 2021.
In principle, provided sufficient details are known, it should be possible to assign any supply chain attack to exactly one attack vector in the taxonomy, according to the technique used by the attacker for getting the initial foothold.
Risk Explorer - An interactive taxonomy visualization
The job could have been considered “done” from a scientific point of view, but to make this content also usable for a broader audience the authors of the taxonomy also developed the Risk Explorer tool, which offers an interactive visualization of the above-described attack tree.
Using this tool, users can expand attack vectors step-by-step to learn details about specific attack techniques, every one of which comes with a description, references as well as safeguards mitigating the respective attack vector. All of the 300+ references can also be consumed independently of the attack tree, and have been tagged with information such as the concerned ecosystem (if any), the year as well as names of compromised packages (in case of attacks).
For further details you can refer to Risk Explorer for Software Supply Chains: Understanding the Attack Surface of Open-Source based Software Development, presented at the ACM Workshop SCORED ‘22 in Los Angeles.
The taxonomy and risk explorer can be used for awareness campaigns and other educational purposes.
But beyond those obvious use cases, additional applications are:
- To systematically classify attacks in a comprehensive attack database (comparable to the use of the Common Weaknesses Enumeration (CWE) for classifying vulnerabilities maintained in the NVD).
- To scope penetration tests, i.e., to define the set of permitted and forbidden targets/techniques.
- To support threat modeling of given development environments, and the respective control selection/design.
- To support risk assessment in a given organization.
Given the increased number of attacks and the complexity of today’s software development processes, the taxonomy requires continuous maintenance to stay up-to-date and relevant, in regards to both incidents linked to existing attack vectors and entirely new, previously unknown attack vectors
The authors of Risk Explorer are continuing to maintain the tool and the taxonomy, and welcome new contributors.
In the ideal case, this work becomes part of a greater industry effort such as the OpenSSF, e.g., to contextualize the many different OpenSSF initiatives started in the last couple of years.
Open Source Licensing Simplified: A Comparative Overview of Popular Licenses
Explore the different types of open source licenses and how they impact the use, modification, and distribution of open source software. From GPL to Apache, MIT and more, learn the key differences between permissive and restrictive licenses and how to choose the right one for your project.