Researchers at Harvard University’s Laboratory for Innovation Science (LISH) have conducted an extensive survey of free and open source software (FOSS) packages, revealing the most commonly used ones across industries. This unprecedented census aims to help the tech community better address the security risks associated with open source software, which have been highlighted by high-profile vulnerabilities like Heartbleed and Log4Shell. These vulnerabilities affected major open source projects, prompting calls for better protection and risk management strategies in the open-source ecosystem.
The timing of the release is significant, as the technology sector is increasingly dependent on open source software, with many critical enterprise and public sector applications built on open source components. However, the widespread use of these technologies has raised concerns about their security and sustainability. By focusing on the application library level, the Harvard census aggregates data from over half a million observations of FOSS libraries used in production applications across thousands of companies in 2020, providing valuable insights into which packages are the most relied upon.
The census highlights the challenges of understanding the health, security, and economic impact of FOSS, particularly since these projects are produced in a decentralized, distributed manner. According to the authors, while tens of millions of FOSS projects are integral to the software we use daily, it’s difficult to fully assess their overall status. The report underscores the need for a more coordinated approach to evaluating the risks and vulnerabilities inherent in these open source packages.
The report is organized into eight ranked lists, divided between those that include version numbers and those that are version-agnostic. It separates packages using the default JavaScript npm package manager from non-npm packages, offering a clearer breakdown of usage patterns. The lists also distinguish between packages directly called by developers and those that are indirectly called as dependencies, highlighting the hidden complexities of software dependencies. While the census does not identify the riskiest open source packages, it sets the stage for future efforts to assess risk profiles once the most widely used software is identified. For organizations managing software bills of materials, these lists offer a useful reference to prioritize security efforts on the most commonly used packages in their environments.