AWS has introduced a new feature called cross-region inferencing to its Amazon Bedrock generative AI service. This addition aims to help developers automate the routing of inference requests, particularly during traffic spikes in AI workloads. The new feature is designed to enhance the service’s scalability and performance, ensuring that high-demand periods do not lead to slowdowns or reduced availability.
Cross-region inferencing, now generally available and offered at no additional cost for developers using Bedrock’s on-demand mode, dynamically routes traffic across various regions. This feature is especially beneficial during peak usage times, allowing applications powered by Amazon Bedrock to maintain optimal performance by distributing the load efficiently across multiple regions. As a result, developers can experience better reliability and faster response times for their AI-driven applications, even when there is a surge in requests.
The on-demand mode inside Amazon Bedrock offers a pay-as-you-go pricing model, allowing developers to pay only for what they use, without requiring long-term commitments. This contrasts with the batch mode, where developers submit a set of prompts and receive responses in bulk, ideal for large-scale predictions. With cross-region inferencing, developers can avoid the hassle of predicting demand fluctuations, as the service automatically handles traffic routing based on current needs, thus improving both performance and reliability.
To use cross-region inferencing, developers can configure the feature through the Amazon Bedrock API or the AWS console. This allows them to define the primary region and choose secondary regions where traffic can be directed in case of traffic spikes. Additionally, with this launch, developers now have the option to select models based in either the U.S. or the EU, each offering two to three preset regions, enhancing flexibility in choosing the most suitable infrastructure for their applications.