The PyTorch Foundation has introduced torchao, a new native PyTorch library designed to make machine learning models both faster and smaller. By leveraging low-bit data types, sparsity, and quantization, torchao enhances the efficiency of models across both training and inference. According to Team PyTorch, this library provides a comprehensive set of techniques that help optimize model performance without requiring significant changes to existing workflows.
Officially unveiled on September 26, torchao seamlessly integrates with torch.compile() and FSDP2, allowing it to work efficiently with most PyTorch models hosted on Hugging Face. As a specialized library for custom data types and optimizations, torchao makes models more compact and computationally efficient straight out of the box. It provides functionality to quantize and sparsify weights, gradients, optimizers, and activations, improving both inference speed and training efficiency. One of its standout features is torchao.float8, which enables faster training by leveraging float8 precision directly within native PyTorch.
The torchao library is designed to be accessible and easy to use, with many of its techniques written in straightforward PyTorch code. This ensures that developers can implement optimizations without requiring deep expertise in low-level hardware operations. Whether applied to model training or inference, torchao simplifies the process of reducing memory footprint and improving computational performance, making it a valuable tool for researchers and developers alike.
Licensed under the BSD 3-Clause License, torchao takes full advantage of PyTorch’s latest features and is recommended for use with the current nightly or latest stable release of PyTorch. By streamlining model optimization and offering native support for advanced quantization techniques, torchao represents a significant step forward in making machine learning models more efficient, scalable, and accessible to a broader audience.