Google and NVIDIA Unveil Advanced Infrastructure to Slash AI Inference Costs

Introduction to New AI Infrastructure Collaboration

During the recent Google Cloud Next conference, tech giants Google and NVIDIA revealed a collaborative hardware roadmap aimed at significantly reducing the cost of AI inference at scale. This initiative centers around the introduction of the A5X bare-metal instances powered by NVIDIA’s Vera Rubin NVL72 rack-scale systems.

Cutting Costs and Boosting Performance

Through a sophisticated co-design of hardware and software, the new architecture targets up to a tenfold reduction in inference cost per token compared to previous generations. Simultaneously, it aims to deliver ten times higher token throughput per megawatt, striking a balance between performance and energy efficiency.

High-Bandwidth Connectivity for Massive Processing

To enable seamless processing across thousands of GPUs, the A5X instances integrate NVIDIA ConnectX-9 SuperNICs with Google Virgo networking technology. This high-bandwidth solution supports scaling up to 80,000 NVIDIA Rubin GPUs within a single cluster and extends to 960,000 GPUs across multisite deployments. Precise synchronization and workload management are critical at this scale to prevent idle compute time and ensure optimal throughput.

Addressing Data Governance and Security

Data sovereignty and security pose significant challenges, especially in regulated industries like finance and healthcare. To meet these requirements, Google Gemini models running on NVIDIA Blackwell GPUs are now previewed on Google Distributed Cloud, enabling organizations to keep their most sensitive data and models within controlled environments.

The infrastructure incorporates NVIDIA Confidential Computing, a hardware-level security protocol that encrypts training data and prompts, safeguarding them from unauthorized access, including cloud operators. Additionally, Confidential G4 VMs with NVIDIA RTX PRO 6000 Blackwell GPUs provide cryptographic protections in multi-tenant public cloud setups, marking the first cloud-based confidential computing offering for Blackwell GPUs.

Simplifying Complex Agentic AI Training

Training multi-step agentic AI systems involves integrating large language models with complex APIs and maintaining continuous data synchronization. NVIDIA Nemotron 3 Super, now available on the Gemini Enterprise Agent Platform, equips developers with tools to customize and deploy reasoning and multimodal models tailored for agentic tasks.

To reduce operational overhead, Google Cloud and NVIDIA introduced Managed Training Clusters featuring an automated reinforcement learning API built with NVIDIA NeMo RL. This system manages cluster sizing, failure recovery, and job execution, allowing data scientists to focus on model improvement rather than infrastructure management.

Integrating Legacy Systems and Enhancing Physical Simulations

Incorporating AI into manufacturing and heavy industry requires bridging digital models with physical factory environments. NVIDIA’s AI infrastructure and physical AI libraries on Google Cloud provide the computational foundation for simulating and automating manufacturing workflows.

Industrial software leaders like Cadence and Siemens utilize NVIDIA-accelerated solutions on Google Cloud for applications including aerospace and autonomous vehicles. The use of NVIDIA Omniverse libraries and NVIDIA Isaac Sim helps developers create precise digital twins and robotics simulations, overcoming challenges posed by legacy data formats.

Deploying NVIDIA NIM microservices such as Cosmos Reason 2 on Google Vertex AI and Kubernetes Engine enables vision-based agents and robots to interpret and navigate real-world environments, advancing the creation of living industrial digital twins.

Real-World Impact and Developer Ecosystem Growth

The new infrastructure accommodates a broad range of acceleration needs, from full NVL72 racks to fractional GPU VMs, allowing precise resource provisioning for diverse AI workloads. Early adopters include OpenAI, which runs large-scale inference for ChatGPT, Snap, which uses GPU-accelerated Spark for cost-effective data pipelines, and Schrödinger, which accelerates drug discovery simulations.

The developer community has rapidly expanded, with over 90,000 members joining the joint NVIDIA and Google Cloud ecosystem within a year. Startups like CodeRabbit and Factory leverage NVIDIA Nemotron-based models for autonomous software development, while companies such as Aible, Mantis AI, Photoroom, and Baseten build enterprise solutions across data, video intelligence, and generative imagery using this platform.

Conclusion

Google and NVIDIA’s collaboration delivers a comprehensive AI computing foundation designed to push experimental AI agents and simulations into production environments, optimizing operations in sectors ranging from cybersecurity to manufacturing. This partnership exemplifies how advanced infrastructure can reduce costs while enhancing performance and security in AI deployments.

Fonte: ver artigo original

Chrono

Chrono is the curious little reporter behind AI Chronicle — a compact, hyper-efficient robot designed to scan the digital world for the latest breakthroughs in artificial intelligence. Chrono’s mission is simple: find the truth, simplify the complex, and deliver daily AI news that anyone can understand.