Azure's Technological Evolution: Deciphering the Impact of Microsoft's Custom Silicon

The history of modern software development has been a dance between what hardware can provide and what software demands. Over the decades, steps in this dance have taken us from the original Intel 8086, which we now consider very basic functionality, to today’s versatile processors that provide virtualization support, encrypted memory, and end-to-end access to data. and expanded instruction sets that power the most demanding application stacks.

This dance is swaying from side to side. Sometimes our software has to stretch to accommodate the capabilities of the next generation of silicon, and sometimes it has to squeeze every last ounce of available performance. We are now finally seeing the arrival of a new generation of hardware that combines familiar CPUs with new system-level accelerators that provide the ability to run complex AI models on both client hardware and servers, both on-premises and in the public cloud.

You’ll find AI accelerators not only in familiar Intel and AMD processors, but also in Arm’s latest-generation Neoverse server-class designs, which combine these features with low power demands (as do Qualcomm’s mobile and laptop offerings). It’s an attractive combination of features for hyperscale clouds like Azure, where low power and high density can help keep costs low while allowing growth to continue.
At the same time, system-level accelerators promise an interesting future for Windows, allowing us to use built-in artificial intelligence assistants as an alternative to the cloud, as Microsoft continues to improve the performance of its Phi series small language models.

Azure Boost: Silicon for virtualization transport
Ignite 2023 saw Microsoft announce its own custom silicon for Azure, hardware expected to begin shipping to customers in 2024. Microsoft has been using custom silicon and FPGAs in its own services for some time. The use of Zipline hardware compression and Project Brainwave FPGA-based AI accelerators are good examples. The newest product is Azure Boost, which offloads virtualization processes from the hypervisor and host operating system to speed up storage and networking for Azure VMs. Azure Boost also includes the Cerberus built-in supply chain security chipset.

Azure Boost aims to give your virtual machine workloads access to as much available CPU as possible. Instead of using CPU to compress data or manage security, dedicated hardware comes into play, allowing Azure to run more customer workloads on the same hardware. Running systems at high utilization is key to public cloud economics, and any investment in hardware will pay off quickly.

Maia 100: Silicone for large tongue models
Large language models (and generative AI in general) demonstrate the importance of intensive computing, with OpenAI using Microsoft’s GPU-based supercomputer to train GPT models. Even on a system like Microsoft’s, large baseline models like GPT-4 require months of training with over a trillion parameters. The next generation of LLMs will need even more computing for both education and operations. If we are building applications based around these LLMs using Retrieval Augmented Generation, we will need additional capacity to create embeddings for our source content and provide the underlying vector-based search.

GPU-based supercomputers are a significant investment, even if Microsoft can recoup some of its capital costs from subscribers. Operating costs are also high due to heavy cooling requirements as well as power, bandwidth and storage. Therefore, we can expect these resources to be limited to a very small number of data centers where sufficient space, power and cooling are available.

But if large-scale AI is to be a successful differentiator for Azure against competitors like AWS and Google Cloud, it will need to be ubiquitous and cost-effective. This will require new silicon (for both training and inference) that can be run at higher densities and lower power than today’s GPUs.

When we look at Azure Project Brainwave FPGAs, we see that they use programmable silicon to implement key algorithms. While they worked well, they were single-purpose devices that served as accelerators for certain machine learning models. You could develop a variant of the MSc that supports complex neural networks, but it would require a large number of simple processor array implementations to support the multidimensional vector arithmetic that drives these semantic models. This is beyond the capabilities of most FPGA technologies.

Post Views: 125

What's Hot

Deno’s Latest Update Adds OpenTelemetry Support

Neo browser reimagines search with built-in AI assistant

Google unveils AI Ultra subscription for power users

Unlock Desktop GPU Power with Asus ROG XG Station 3

OpenSilver Expands Cross-Platform Reach with iOS and Android Support

Introducing AMD’s 96-Core Threadripper 9000 CPUs: A New Era in Computing

AMD’s Radeon RX 9060 XT Delivers Better Value Than Nvidia’s RTX 5060 Ti

MSI’s Claw A8 Introduces AMD-Powered Gaming Handheld

Azure’s Technological Evolution: Deciphering the Impact of Microsoft’s Custom Silicon

Deno’s Latest Update Adds OpenTelemetry Support

Neo browser reimagines search with built-in AI assistant

Google unveils AI Ultra subscription for power users

Apple Planning Big Mac Redesign and Half-Sized Old Mac

Autonomous Driving Startup Attracts Chinese Investor

Onboard Cameras Allow Disabled Quadcopters to Fly

Review: T-Mobile Winning 5G Race Around the World

Samsung Galaxy S21 Ultra Review: the New King of Android Phones

Xiaomi Mi 10: New Variant with Snapdragon 870 Review

Subscribe to Updates

What's Hot

Azure’s Technological Evolution: Deciphering the Impact of Microsoft’s Custom Silicon

Related Posts