Just like in other tools across your secure SDLC, different tools can provide different results. Why does this happen for SCA specifically? How come I get different results from tool A and tool B? Why did tool A find this vulnerability, but tool B didn't? In this post, we'll talk about why different tools can get different results when scanning the same project or application, and what we can do to ensure that we get the best results and the most out of the tools we purchase.
So, why are the results different?
While there are a lot of reasons why the results could be different (language support, user environment, import methods, etc.), we're going to focus on 3 major ones here:
- Scanning techniques used in SCA
- Data and information sources used by different tools
- Building an in-house SCA scanner vs. using an open-source provider
SCA Scanning Techniques
The scanning technique of SCA is the “how” of it - how does it actually identify the dependencies to let you know which ones are knowingly and unknowingly brought into your application? One scanning technique can yield different results than another technique, and some tools can use one or more of the scanning techniques we talk about below:
Package Manifest Scanning:
- This approach looks at the metadata from the package manager used in the application. For instance, if the project uses npm for package management, the package.json file can provide a lot of information about the packages used. However, more than this method alone is needed to provide complete coverage as it only identifies packages that are installed via the package manager or what’s listed in the manifest file. With this method, the mapped dependencies are inferred (or guessed).
- This method involves understanding the context of the code, its structure, and its purpose to identify open-source components. Semantic analysis can be particularly effective at identifying cases where the code has been significantly modified, however, it does produce a high number of false positives.
- This technique involves calculating the checksum (or hash value) of code components and comparing them to a database of known open-source components and their respective hashes. If the hashes match, the software component is identified. However, this method often fails when the code has been modified in some way.
Source Code Scanning (aka Snippet Scanning):
- This technique analyzes the code in an application to identify fragments that match known open-source projects. This method is thorough as it can catch cases where code has been copied and pasted, or slightly modified, and can identify indirect dependencies as well. However, this method is also more computationally intensive (long scan times) and prone to false positives or negatives.
Dependency Graph Scanning:
- Some SCA tools build a dependency graph of all components in a project and their relationships, and they use this graph to trace the propagation of vulnerabilities and license issues. This method can be particularly effective at identifying transitive dependencies, which are dependencies of your direct dependencies.
Data & Information Sources
Where a tool’s data comes from and what sources of information it relies on can really affect the results you see coming out of it. Now, when we say “data and information sources” here, we’re lumping 2 things together:
- The tool’s vulnerability and license data
- Where the tool gets information about the application and its structure
A tool’s vulnerability and license data can be comprised of several different sources, but it typically breaks down as part publicly sourced (NVD, MITRE, GitHub Advisory, etc.) and part proprietary. The proprietary part is the one that we’ll take a look at.
When taking a look at results from different tools, one of the first questions to ask is “Does this tool even know this vulnerability or license issue exists?” There are a few things we need to look at when asking about a tool’s data:
- Timeliness: How quickly is a newly disclosed vulnerability added to the tool’s database?
- Sources: Where does the tool get its data from? Is it simply from publicly known sources, or does the vendor have a team of researchers that can disclose and validate findings?
- Quality & Data Enrichment: What is the tool providing in addition to vulnerabilities and license information? Is there extra context around the vulnerability (e.g. EPSS)? What other risk is the tool surfacing about the open source components being brought in?
Now that we understand the data that the tool can provide to us, we need to think about what the tool is looking at, and what else it’s utilizing in order to get results. Some tools will leverage some of the specific language or package manager’s built-in commands to resolve dependencies. Some will rely on a certain file (e.g. lock files) or wrapper to help accurately determine which components are being utilized. Here are some language-specific examples:
- Utilizing Gradle wrapper files to build packages and resolve dependencies
- Leveraging Go’s built-in commands to help replicate the way a package manager would install the dependencies
- For Maven projects, the maven cache in the .m2 directory of the file system can be leveraged to resolve a package’s dependencies
- For .NET applications, the packages.lock.json can be automatically generated and is used by NuGet to ensure consistent package installations across different environments (check out our blog post about this!)
In-house Scanning vs. Open Source Scanning
To build, or not to build - that is the question. There’s always the whole “build vs. buy” argument, but when it comes to a scanning engine, what’s the best route? A very Solutions Architecture-y answer is “well, it depends”. The benefits of building your own scanning engine are pretty obvious, but what about using one that’s already available and open source? There are even some tools out there that will take one of the many available open source scanners, and re-package it as the engine behind their SCA scanning tool.
The benefits of using an open source scanner are pretty clear - it’s faster and, obviously, less of an expense. But, as the saying goes, you get what you pay for. Some of the drawbacks of taking this route include:
- Results can be inaccurate (higher false positives, missed vulnerabilities)
- Difficulty setting up in a specific or large environment
- Quality, timeliness, and enrichment of data
- Lack of key capabilities like reporting, filtering, integrations with popular tools
Using an open source scanner definitely has its use cases, but as the number of applications and projects grow, there comes a point where a tool with more robust capabilities is needed (since it has the dedicated resources behind it).
So - what can we do about different results and how should we interpret them? There are a few key areas to look at while validating the different results from different tools:
- Make sure that the programming languages you work with are supported for your situation. While several tools do support the majority of commonly used languages, there are certain caveats and limitations for some languages. Be sure to understand which programming languages a tool supports, and to what extent that support is.
- There are plenty of places to implement an SCA tool across your SDLC (software development lifecycle) - mainly being at the IDE, source code management, and CI/CD steps. Depending on where in the SDLC the SCA tool is implemented, you’ll see different results. Place your SCA tool in the SDLC where you’ll see the most accurate and most actionable results.
- When you do get reliable and actionable results, set up policies to take affective action on them. Try to narrow down the results to the highest priority as possible by looking at things like fixability, CVSS, EPSS, excluding test dependencies, whether or not the vulnerable dependency is used, and whether or not the vulnerable function within a dependency is actually reachable.
- Each programming language is different in its own way, and with that, come certain caveats on how they work with open source dependencies. Be sure to know about these caveats with certain programming languages, along with the specifics of how their respective package manager works. This will enable you to spot certain anomalies in results.
We hope this helps clear up some of the information you may see on a day-to-day basis when it comes to interpreting the results from your SCA tools! If you have any questions or if there’s anything we can help you with, please don’t hesitate to reach out. Thanks for reading!