- Main
Accurate and Efficient SBOM Generation for Software Supply Chain Security
Abstract
Modern software development increasingly relies on software supply chains, with third-party libraries constituting a significant portion of many projects. However, the complexity of dependency relationships and the lack of transparency in software make identifying and fixing vulnerabilities challenging and costly. For example, the average cost of a Log4j incident response has reached $90,000, and nearly 40% of applications still use vulnerable Lo4j two years after the vulnerability was disclosed. A Software Bill of Materials (SBOM), which lists the dependencies used to build software, has been proposed to enhance software visibility and aid in vulnerability detection. Despite this, there is not yet an accurate SBOM generation solution for both source code and binary. Current SBOM generators focus solely on metadata and produce inconsistent SBOM files, while the existing SBOM generators for binary files are either too slow or inaccurate.
In this thesis, we propose an accurate and fast SBOM generation approach for both source code and binary. First, to improve SBOM generation for source code, we conducted a differential analysis to compare and understand how the existing SBOM generators work and why they behave so differently. We found that these generators support only a subset of common metadata, and their self-implemented parsers for metadata have incomplete syntax supports, leading to erroneous SBOM results. We propose using package managers to simulate dependency installation for metadata-based SBOM generation. Second, we introduce DeepDi, a novel graph neural network-based disassembler that is both accurate and efficient for better SBOM generation for binaries. Our study showed that disassembly is often the bottleneck of binary analysis tasks, consuming up to 90% of processing time. DeepDi improves efficiency by hundreds of times compared to commercial disassemblers and is as accurate or better than them. Third, to further improve the accuracy of SBOM generation from binaries, we propose GrassDiff, a novel learning-free graph-matching algorithm that effectively and efficiently identifies static-linked libraries in large binaries.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-