Next-Gen SCA for C/C++: Closing the Detection Gap
A new method for identifying OSS dependencies and vulnerabilities in C/C++ with greater accuracy and precision than legacy tools.
A new method for identifying OSS dependencies and vulnerabilities in C/C++ with greater accuracy and precision than legacy tools.
A new method for identifying OSS dependencies and vulnerabilities in C/C++ with greater accuracy and precision than legacy tools.
A new method for identifying OSS dependencies and vulnerabilities in C/C++ with greater accuracy and precision than legacy tools.
A new method for identifying OSS dependencies and vulnerabilities in C/C++ with greater accuracy and precision than legacy tools.

Today we’re announcing language support for C and C++ applications in Endor Labs software composition analysis (SCA) product. This update builds on the Endor Labs application security platform, which combines AI, deep code analysis, and industry-leading security intelligence to deliver precise risk insights and remediation guidance to security and engineering teams.
C and C++ have historically been difficult for SCA tools to handle. To address this, we developed a new approach that improves visibility and accuracy for C and C++ codebases.
With this update, customers can now:
- Build a complete software bill of materials (SBOM) for C and C++ applications
- Detect vulnerabilities with significantly fewer false negatives
- Accurately track license compliance
In the rest of this post, we’ll explore why traditional tools fall short in C/C++ environments— and how we solved the problem.
Why C and C++ break traditional SCA
The C/C++ ecosystem poses distinct challenges for OSS vulnerability scanning, which is very different from other languages. Most other ecosystems offer:
- Centralized package management: Each ecosystem has its own place to check the list of available packages and their versions. Python has PyPi, JavaScript has npm, Java has Maven, to name a few.
- Standard import process: The programming languages provide a standard way to import libraries. Even without looking at the package manager manifest, extracting all the packages imported is relatively easy.
C and C++ are very different beasts. To start, there is no central repository for C/C++. While package managers, such as vcpkg and Conan, exist for C and C++, they are not as widely adopted. Other ecosystems typically use a package manager to list and install all libraries on which an application or a service depends. Even if developers use packages outside a central repository, like a private artifactory or other locations such as GitHub, the complete package address is listed in the package manager manifest.
This lack of a central repository, coupled with the fact that C/C++ are older and predate modern source code management (SCM) platforms like GitHub, means that the libraries are hosted in various places, such as Sourceforge or on individual websites. This decentralization makes it difficult to reliably determine package usage trends or maintain a canonical index of libraries.
To get around these limitations, C and C++ developers typically copy OSS libraries locally, often in their application repository. As a result, there is no way to differentiate between an open-source library downloaded from the internet and a private library developed internally. Developers can also make small or significant changes to the OSS libraries to fit their needs, such as adding license information to each file, adding comments to the code, or making minor code changes or fixes. This makes it even harder to detect the libraries, and to properly assess if they have vulnerabilities.
How Endor Labs solves it
Endor Labs can precisely identify the origins of copied files and code in C and C++ projects by matching code “fingerprints” against an extensive index of open source libraries. This approach allows us to detect dependencies even when the code has been copied or modified.
Here’s how it works:
- We break each file into segments: These include functions, types, licenses, and everything else (e.g. import statements).
- For each segment, we generate two types of identifiers:
- A cryptographic hash — this is a short, fixed-length string of numbers and letters that uniquely represents the segment. Even a single-character change produces a completely different hash. Hashes are fast and efficient, and they do the bulk of the work: they allow us to match identical code segments with extremely high precision.
- A code embedding — this is a numeric representation of a code segment generated by a machine learning model trained to understand code. Unlike a hash, an embedding captures the meaning of the code, so it can recognize when code has been slightly changed, reformatted, or customized but still performs the same function.
- We match these fingerprints against our curated index of C and C++ libraries. Hashes are used first to find exact matches quickly. For any unmatched segments, we fall back to embeddings to find near-matches — like snippets that were copied and lightly modified.
This two-layered approach balances performance and accuracy: hashes give us speed and precision for the vast majority of matches, while embeddings help close the gap by uncovering reused or modified code that traditional tools miss. And both rely on the extensive index curated by our security research team.
This combination of modern AI/ML techniques, deep code analysis, and extensive OSS data improves dependency and vulnerability detection rates for C and C++ projects.
Results
Endor Labs customers have tested our solution against their existing tools, resulting in on average:
- 81% fewer false negatives — thanks to better matching and fewer missed files
- 143% more true positives — meaning you have better visibility into real risks
We’ll walk through our benchmarking process in a future post so you can run similar tests internally.
If you’d like to learn more, contact our team to see how we can help you build a complete and accurate SBOM for your C/C++ applications.