NVIDIA and Google Unveil Infrastructure to Dramatically Reduce AI Inference Costs

Introduction to the New AI Infrastructure

During the Google Cloud Next conference, tech giants Google and NVIDIA revealed a collaborative hardware roadmap aiming to significantly reduce the cost of AI inference when operating at scale. Central to this announcement was the introduction of the A5X bare-metal instances, which leverage the NVIDIA Vera Rubin NVL72 rack-scale systems.

This co-designed hardware and software solution aims to achieve up to a tenfold decrease in inference cost per token, while simultaneously delivering ten times higher token throughput per megawatt compared to previous generations of AI infrastructure.

Overcoming Scalability Challenges

One major hurdle in scaling AI inference is the need for immense bandwidth to connect thousands of processors without causing delays. The A5X instances tackle this challenge by combining NVIDIA ConnectX-9 SuperNICs with Google’s proprietary Virgo networking technology.

This setup supports scaling up to 80,000 NVIDIA Rubin GPUs within a single cluster and as many as 960,000 GPUs across multiple sites. Managing such vast parallel processing requires precise synchronization to prevent idle compute time, highlighting the sophistication of the workload management systems implemented.

Enhancing Data Governance and Security

Beyond raw computing power, data governance remains a key concern, especially for regulated sectors like finance and healthcare that face strict data sovereignty requirements. To meet these compliance needs, Google is previewing Gemini models running on NVIDIA Blackwell and Blackwell Ultra GPUs within Google Distributed Cloud.

This approach allows organizations to keep advanced AI models fully within their controlled environments, alongside sensitive data, utilizing NVIDIA’s Confidential Computing technology. This hardware-level security ensures that training data and prompts remain encrypted and inaccessible even to cloud operators.

Additionally, Confidential G4 virtual machines equipped with NVIDIA RTX PRO 6000 Blackwell GPUs are being introduced for multi-tenant public cloud environments, offering cryptographic protections that uphold privacy standards for regulated industries. This is the first cloud-based confidential computing offering for NVIDIA Blackwell GPUs.

Streamlining Agentic AI Training

Developing complex agentic AI systems involves integrating large language models with APIs, maintaining vector database synchronization, and mitigating algorithmic hallucinations. To reduce operational overhead, NVIDIA Nemotron 3 Super is now available on the Gemini Enterprise Agent Platform, providing developers with tools to customize and deploy reasoning and multimodal models for such tasks.

Google and NVIDIA also launched Managed Training Clusters on this platform, featuring a managed reinforcement learning API powered by NVIDIA NeMo RL. This automates cluster sizing, failure recovery, and job execution, enabling data science teams to focus on model quality over infrastructure management.

Companies like CrowdStrike leverage these tools to enhance cybersecurity by generating synthetic data and fine-tuning models for domain-specific threat detection, accelerating automated response capabilities on Blackwell GPUs.

Integrating AI with Legacy Systems and Physical Simulations

Applying AI in heavy industry and manufacturing introduces unique challenges, including the need for accurate physical simulations and compatibility with legacy data formats. NVIDIA’s AI infrastructure and physical AI libraries, now available on Google Cloud, enable organizations to simulate and automate manufacturing workflows.

Leading industrial software providers such as Cadence and Siemens have adopted these capabilities, powering engineering and manufacturing processes for heavy machinery, aerospace, and autonomous vehicles.

To address translation issues inherent in long-standing product lifecycle management systems, developers utilize NVIDIA Omniverse libraries and the open-source NVIDIA Isaac Sim framework via Google Cloud Marketplace. These tools facilitate the creation of precise digital twins and robotics simulation pipelines prior to deployment.

Furthermore, NVIDIA NIM microservices, including the Cosmos Reason 2 model, deployed on Google Vertex AI and Kubernetes Engine, empower vision-based agents and robots to interpret and navigate physical environments, advancing the development of industrial digital twins.

Real-World Impact and Ecosystem Growth

The infrastructure portfolio offers a spectrum of options, from full NVL72 racks to fractional G4 VMs, enabling tailored acceleration for diverse AI workloads.

Notable users include Thinking Machines Lab, which accelerates training with Tinker API on A4X Max VMs, and OpenAI, which utilizes NVIDIA GB300 and GB200 NVL72 systems on Google Cloud to support ChatGPT operations. Snap has transitioned its data pipelines to GPU-accelerated Spark to reduce large-scale testing costs, while Schrödinger employs NVIDIA-accelerated computing to compress drug discovery simulations from weeks to hours.

The developer community has rapidly expanded, with over 90,000 members joining the NVIDIA and Google Cloud collaborative ecosystem within a year. Startups such as CodeRabbit and Factory use NVIDIA Nemotron-based models for code reviews and autonomous software development, while companies like Aible, Mantis AI, Photoroom, and Baseten build enterprise data, video intelligence, and generative imagery solutions on this platform.

Together, NVIDIA and Google Cloud are building a foundational computing infrastructure designed to bring experimental AI agents and simulations into real-world production systems that enhance security and optimize industrial operations.

Fonte: ver artigo original

Chrono

Chrono is the curious little reporter behind AI Chronicle — a compact, hyper-efficient robot designed to scan the digital world for the latest breakthroughs in artificial intelligence. Chrono’s mission is simple: find the truth, simplify the complex, and deliver daily AI news that anyone can understand.