The hype around generative AI is beginning to meet reality. During a recent earnings call, Alphabet CEO Sundar Pichai highlighted the growing adoption of Google Cloud’s generative AI solutions but tempered his optimism with a crucial caveat: “These things take time.” While there is a lot of enthusiasm and experimentation surrounding generative AI, the actual uptake for serious, revenue-generating applications remains relatively low. This acknowledgment suggests that while the technology is promising, it’s not yet fully ready for widespread commercial use.
This slower pace of adoption could work in favor of the industry. It allows for more reflection on the complexities of AI, particularly in the realm of open-source models. Mark Zuckerberg and others in the industry have made bold claims about the future dominance of open-source AI, especially in the development of large language models (LLMs). However, the concept of “open source” in the AI space is becoming increasingly muddled. While organizations like Meta release models and label them as open-source, they don’t always adhere to the traditional principles of open-source software, leading to debates about the authenticity of these claims. The term “open source” is being stretched to fit new definitions, raising questions about what it truly means in the context of AI.
Does it really matter if an AI model is truly open-source? For some, the answer is a resounding yes. As OSI executive director Stefano Maffulli points out, it’s not just about having access to code. True open-source AI requires access to the full ecosystem surrounding the model—training data, preprocessing code, training process code, and the model’s underlying architecture. Without access to these critical components, claiming a model is “open” is misleading, since the value and functionality of the model are driven by the data it’s trained on. This is a fundamental issue in AI development that cannot be ignored.
The argument over what constitutes “open” AI ultimately revolves around data. As Julia Ferraioli, a key participant in the OSI’s AI open-source committee, asserts, if the training data isn’t open, the AI model can’t truly be open. This perspective highlights the intertwined nature of code and data in AI. However, this debate also exposes an underlying irony—many of the voices championing data openness are from companies, like AWS, that have their own motivations for controlling access to data. These companies have little incentive to relinquish control, just as cloud providers are reluctant to open-source their infrastructure. Meanwhile, developers themselves may be less concerned with the intricacies of open-source definitions and more focused on getting AI models that work effectively. The industry’s emphasis on “open” may not align with what developers actually want, and this gap could drive the conversation in unforeseen directions.