Introduction to the New AI Infrastructure Partnership
During the recent Google Cloud Next conference, technology giants Google and NVIDIA announced a significant advancement in AI infrastructure designed to tackle the rising costs of AI inference at large scale. The collaboration focuses on delivering a more efficient and cost-effective platform for running demanding AI workloads.
Breakthrough Hardware: A5X Bare-Metal Instances
The centerpiece of this initiative is the introduction of the A5X bare-metal instances operating on NVIDIA’s Vera Rubin NVL72 rack-scale systems. This new architecture integrates hardware and software codesign to achieve remarkable performance gains.
Specifically, the A5X instances offer up to ten times lower inference cost per token compared to previous generations, alongside a tenfold increase in token throughput per megawatt of power consumed. This represents a substantial leap in AI processing efficiency, addressing both performance and sustainability.
Overcoming Bandwidth Challenges
One of the major technical challenges in scaling AI inference is the need for massive bandwidth to prevent delays when connecting thousands of processors. To address this, NVIDIA’s ConnectX-9 SuperNICs are combined with Google’s Virgo networking technology, allowing the system to scale up to 80,000 GPUs within a single site cluster and nearly one million GPUs across multiple sites.
Security and Data Governance for AI Deployments
Beyond raw computational power, data privacy and sovereignty are critical concerns for enterprises, especially in regulated industries like finance and healthcare. Google and NVIDIA are addressing these through secure deployment options.
Google Gemini models running on NVIDIA Blackwell GPUs are now available on Google Distributed Cloud, enabling organizations to keep sensitive data and AI models within their own controlled environments. This is supported by NVIDIA Confidential Computing technology, which encrypts training data and prompts at the hardware level to prevent unauthorized access, even from cloud operators.
Additionally, the introduction of Confidential G4 VMs with NVIDIA RTX PRO 6000 Blackwell GPUs offers cryptographic protections for multi-tenant public cloud environments, marking the first cloud confidential computing offering for these GPUs.
Streamlining AI Training and Operational Efficiency
Developing complex agentic AI systems involves managing multi-step workflows and mitigating issues such as algorithmic hallucinations. To support this, NVIDIA Nemotron 3 Super is integrated into the Gemini Enterprise Agent Platform, providing developers with tools to deploy reasoning and multimodal models optimized for agentic tasks.
Managing infrastructure for training at scale can be challenging. Google Cloud and NVIDIA have introduced Managed Training Clusters equipped with a managed reinforcement learning API, automating cluster sizing, failure recovery, and job execution. This allows data scientists to focus on improving model quality rather than infrastructure management.
Industrial Applications and Legacy System Integration
The partnership also targets industrial sectors, where integrating AI with physical manufacturing processes poses unique challenges. NVIDIA’s AI infrastructure and physical AI libraries are now accessible via Google Cloud, enabling precise simulations and automation of manufacturing workflows.
Leading industrial software companies like Cadence and Siemens utilize this infrastructure to enhance engineering and manufacturing for sectors including aerospace and autonomous vehicles.
NVIDIA Omniverse libraries and the Isaac Sim framework help developers build physically accurate digital twins and robotic simulations, overcoming the difficulties of working with legacy product lifecycle management systems.
Real-World Impact and Developer Ecosystem Growth
The new infrastructure supports a wide range of scales, from full rack deployments to fractional GPU instances, allowing flexible provisioning for diverse AI workloads.
Notable users include OpenAI, employing NVIDIA GPUs on Google Cloud for ChatGPT inference, Snap, which optimized its data pipelines with GPU acceleration to reduce costs, and Schrödinger, which accelerated drug discovery simulations.
The developer community has rapidly expanded, with over 90,000 members joining the collaborative NVIDIA and Google Cloud ecosystem in just one year. Startups such as CodeRabbit and Factory use the platform for autonomous software development, while others focus on enterprise data intelligence and generative AI applications.
Conclusion: Advancing AI at Scale with Efficiency and Security
Google and NVIDIA’s joint effort represents a major step forward in enabling scalable, efficient, and secure AI deployments across industries. By combining cutting-edge hardware, advanced networking, and comprehensive software tools, the partnership aims to empower organizations to harness AI’s full potential while managing costs and compliance challenges.
Fonte: ver artigo original

Balancing AI Cost Efficiency with Data Sovereignty: A Growing Challenge for Global Enterprises
OpenAI Launches GPT-5.4 Mini and Nano Models: Enhanced Speed and Capability with Higher Costs
Cochlear Unveils Groundbreaking Machine Learning Cochlear Implant with Edge AI Capabilities
X Launches ‘About This Account’ Feature to Enhance User Transparency