This article was written by Herman Venter and Georgios Gousios
A fundamental step in all engineering processes is the validation and verification of designs. A structural engineer designing a building can (and will) run a simulation of the building's behavior in case of an earthquake. This is actually possible because a building is a physical object that operates within a frame of reference (nature) for which we have deduced precise laws that govern it (physics).
It is natural for software engineers to want to do the same for software. With software, however, the frame of reference needed to perform a design analysis is part of the design itself, and usually not very well specified either. Checking if a program will work correctly means to check if the program’s behavior satisfies a desirable property (for example, the program does not ever dereference a null pointer). The set of techniques used to prove whether programs satisfy properties are called program analysis techniques.
The analysis is performed on the source or a derived representation of a program. Static analysis is a method of examining software code without actually executing it. A variant of static analysis, used where compliance requirements are stricter, is to construct a program model (a simplified view of the desired behavior), and then exhaustively checking properties against all states of the model (much like fuzz testing, but done at compile time).
The program is instrumented with specific probes, trace data is collected while it is running and then analysis is performed on the traces. Dynamic analysis is a technique used for analyzing and evaluating the behavior of a program during runtime.
How is program analysis used?
Program analysis techniques are in everyday use by software engineers. Perhaps the most well known tool that uses program analysis is the compiler. A compiler will use techniques such as constant folding and dead code elimination to make any program execution as efficient as possible. In a compiler, the laws that the compiler needs to adhere to are dictated by the semantics of the programming language it compiles.
Given some property of interest, an analysis tool can either guarantee that the property holds for all possible executions of a given program or just make a best effort. If it provides a guarantee, we call the analysis “Sound”.
Here’s a helpful analogy to explain this concept. Imagine that you’re going through security at an airport. In this example, your bag is the code, and the security checkpoint is the scanner, or analysis tool. If you go through a simple metal detector, it can alert you when it detects metal, but for the purpose of finding specifically dangerous things in your bag, it can’t give a 100% guarantee there’s nothing there. The analysis the metal detector does is not sound.
An X-Ray machine, on the other hand, can give a more accurate representation of everything inside the bag. While the X-Ray doesn’t know if a firearm is actually a toy, or if a container of liquid contains something explosive, it does know that there’s something that looks like a firearm or a liquid container in the bag. Still the X-Ray analysis is not sound, as it cannot give any guarantees, but it is much better than a metal detector.
To achieve the highest degree of analysis quality, we need to have the bag contents analyzed by a human agent, who will (hopefully!) be able to tell a toy from a firearm. This analysis approaches perfect soundness, but it is tedious and costly; this is why we often have to rely on imperfect analyses (metal detectors or X-Rays). As in this example, software analysers also sometimes sacrifice soundness for practicality and efficiency. The cost we have to pay in this case is either false or missing findings.
False Positives and Negatives
By “Positive” we mean a diagnostic message issued by the analyzer. By “False Positive” we mean the diagnostic is incorrect. For example, if the analyzer says “this object dereference may result in a null pointer exception” and it turns out that it is in fact possible to prove that the object is never null in any execution, then the positive (the diagnostic) is false.
In our airport example, this is when the X-Ray sees something that looks like a gun, but it ends up being a toy, and we can prove to the TSA agent that this toy gun is harmless, and they’re just holding up the line, and I’ve got a plane to catch!
On the other hand, if this diagnostic is issued whenever the analyzer cannot prove there won't be a null dereference, that means that it can prove there will be no null dereference whenever it does not issue a diagnostic. As long as it can sometimes come up with such proofs, the analysis is non-trivial and actually useful. In short, if no diagnostic is produced (also known as a “Negative” diagnostic) we can sleep soundly because there is a proof that there will never be a null dereference.
Going back once again to the airport, this is a case in which the X-Ray can guarantee that there isn’t a gun in your bag. It didn’t detect anything that even remotely looks like a gun, and therefore we have proof that there’s no gun in there.
So the process would be: scan the bag, if there’s something that looks like a gun, ALERT! It might be a false positive and turn out to be a toy, but you can’t be too safe. If there isn’t anything that looks like a gun (a negative diagnostic), we have proof that there’s no gun in the bag and you’re good to go. If there’s something that looks like a toy, but is actually a cleverly designed gun…that’s a false negative, and we’re all in trouble.
In practice, however, the price of such soundness is that we get swamped with diagnostics that are not actually true (false positives) and we sooner or later throw up our hands in despair and stop using the analyzer (that’s why we all throw away our water before going through airport security!).
Anyone using a static type system will be familiar both with false positives and with the fact that type systems always provide loopholes (often expressed as runtime checks). When those loopholes are inevitably used, we open up the possibility of runtime errors, which means that sometimes the negative diagnostic (no type error) turns out to be false.
The price of soundness tends to be lots of False Positives, and the price of reducing the number False Positives to something we can cope with is to put up with False Negatives.
If an analysis has no false positives, it is called “Complete”. This is not a very intuitive name (blame the Mathematicians), but the way to remember it is Sound = no false negatives and Complete = no false positives.
No non-trivial analysis can be both Sound and Complete (trust the Mathematicians). The best bargain to be had is a static analyzer that reduces both False Positives and False Negatives to levels that we are prepared to put with. Such bargains are hard to find, but at Endor Labs we are working hard on changing that.
in the next blog post we will examine what a call graph is and why they matter.
Endor Labs partners with Zinfinity to help enterprise safely adopt Open Source Software
We’re excited to announce our latest partnership with Zinfinity as a strategic partner. Zinfinity is a global provider of technology solutions and services with a focus on Cyber Security, Cloud and Digital Infrastructure.