In the pre-ChatGPT era, the camaraderie among natural language AI researchers thrived on open exchanges, fostering innovation and scrutiny. However, the paradigm shifted with the rise of colossal Language Models (LLMs) like OpenAI’s GPT-4, compelling research labs to guard their discoveries as proprietary assets. Challenging this trend is the Allen Institute for AI (AI2), a Seattle-based nonprofit founded by Microsoft’s Paul Allen in 2014.
AI2 boldly steps into the spotlight by unveiling OLMo 7B, a large language model, while going beyond mere model release. In a groundbreaking move, AI2 shares all associated software components and training data on GitHub and Hugging Face. Hanna Hajishirzi, AI2’s Senior Director of Research, leading the OLMo project, emphasizes their commitment to transparency. “During this process, we actually want to open up everything—the training data, the pretraining data, the source code, the details of the parameters, and so on.”
The initiative seeks to grant the AI research community complete visibility into a cutting-edge LLM, fostering advancements in natural language processing and addressing existing LLM challenges through rigorous scientific exploration.
Sophie Lebrecht, AI2 COO, stresses the need for a comprehensive methodology to evaluate LLMs. Full access to data empowers researchers to understand the model’s behavior thoroughly. AI researchers currently struggle to attribute LLM outputs to specific training data, and OLMo 7B’s transparency provides a potential breakthrough, offering insight into the model’s reasoning from training data to decision outputs. This visibility holds promise for addressing challenges such as hallucinations and bias.
Despite the prevalence of large closed models due to their size and cost, AI2’s open-source approach aims to break barriers. Researchers often resort to using closed models from industry giants like OpenAI or Google, accepting outputs without insight into the ‘why’ and ‘how.’ Hajishirzi compares this limitation to an astronomer studying the Solar System through newspaper pictures.
Quoted in the OLMo announcement is Yann LeCun, Meta’s Chief AI Scientist, advocating for open-sourcing AI models. Hajishirzi acknowledges Meta’s contributions but notes that even their Llama models lack full openness. “They have made the model open, but still, the data is not available, we don’t understand the connections starting from the data all the way to capabilities.”
OLMo, a midsize model with seven billion parameters, trained on two trillion tokens, emerges as a trailblazer for openness in AI. Lebrecht highlights the environmental impact, emphasizing that open research prevents redundant efforts, reducing energy consumption associated with repeated endeavors.
In conclusion, AI2’s OLMo 7B stands as a beacon, challenging the secrecy trend in AI research and setting the stage for a collaborative, transparent future in the realm of large language models.