AWS Partners with NVIDIA for Advanced AI Infrastructure via NVLink Fusion

Amazon Web Services (AWS) has announced a strategic collaboration with NVIDIA to integrate NVIDIA NVLink Fusion into its AI infrastructure, as revealed at the AWS re:Invent conference. This integration is set to enhance the deployment of AI technologies, particularly focusing on the new Trainium4 AI chips, Graviton CPUs, Elastic Fabric Adapters (EFAs), and the Nitro System virtualization infrastructure, according to the official NVIDIA blog.

Enhancing AI Infrastructure with NVLink Fusion

The NVLink Fusion serves as a rack-scale platform that enables industries to construct custom AI rack infrastructure using NVIDIA’s scale-up interconnect technology. This integration is part of a broader collaboration between AWS and NVIDIA, leveraging NVLink 6 and NVIDIA’s MGX rack architecture for optimal performance. The collaboration aims to boost performance, increase return on investment, and reduce deployment risks associated with custom AI silicon.

Addressing Deployment Challenges

As AI workloads become increasingly complex, the demand for robust compute infrastructure grows. NVLink Fusion addresses these challenges by providing a high-bandwidth, low-latency interconnect to connect entire racks of accelerators. This approach is crucial for handling emerging workloads like planning, reasoning, and agentic AI, which require sophisticated models and systems working in parallel.

Hyperscalers face significant hurdles, such as long development cycles and managing a complex supplier ecosystem. Developing a complete rack-scale architecture involves coordinating multiple components, from CPUs and GPUs to cooling systems and power management. NVLink Fusion mitigates these challenges, streamlining the process and reducing risks.

Technological Advancements with NVLink 6

The core of NVLink Fusion is the NVLink Fusion chiplet, which hyperscalers can integrate into their custom ASIC designs. This chiplet connects to the NVLink scale-up interconnect and NVLink Switch, enabling high-speed connectivity of up to 72 custom ASICs at 3.6 TB/s per ASIC. The NVLink Switch offers peer-to-peer memory access and supports advanced protocols like NVIDIA SHARP for in-network reductions.

Reducing Costs and Accelerating Time-to-Market

NVLink Fusion offers a modular portfolio of AI factory technology, including NVIDIA MGX rack architecture and a comprehensive ecosystem of partners. This setup allows hyperscalers to significantly cut development costs and accelerate time-to-market compared to assembling their own technology stacks. AWS benefits from this ecosystem, eliminating many risks associated with rack-scale deployments.

Heterogeneous AI Silicon Integration

NVLink Fusion also enables AWS to maintain a heterogeneous silicon offering within a unified infrastructure. This flexibility allows for rapid scaling to meet the demands of intensive AI model training and inference workloads. By adopting NVLink Fusion, AWS is poised to drive faster innovation cycles and bring custom AI chips to market more efficiently.

For further details, visit the NVIDIA blog.

Image source: Shutterstock

Source: https://blockchain.news/news/aws-nvidia-advanced-ai-infrastructure-nvlink-fusion