AI Chronicle|1,200+ AI Articles|Daily AI News|3 Products in ShopFree Newsletter →

Google Introduces Framework to Optimize AI Agents’ Compute and Tool Budgets

Researchers at Google, in collaboration with UC Santa Barbara, have developed an innovative framework designed to help AI agents optimize their use of computational and tool resources. Detailed in a recent research paper, the approach introduces two key techniques: a lightweight “Budget Tracker” and a comprehensive system named “Budget Aware Test-time Scaling” (BATS). Both methods enable AI agents to remain cognizant of their remaining processing and tool-use limits during task execution.

As AI agents increasingly depend on external tool calls to operate effectively in real-world scenarios, controlling the cost and latency of these interactions has become a central challenge. Traditional methods for enhancing AI reasoning typically extend processing time but overlook operational expenses associated with external tool usage.

Challenges in Scaling Tool Usage for AI Agents

Conventional test-time scaling strategies focus on increasing the duration an AI model spends “thinking.” However, in agentic tasks such as web browsing, the quantity of tool calls directly influences the scope and depth of information gathering. This can lead to significant overhead, including greater token consumption, longer context lengths, increased latency, and higher API costs.

Co-authors Zifeng Wang and Tengxiao Liu explained to VentureBeat that simply providing agents with more resources does not necessarily improve results. Agents that lack budget awareness often waste their limited tool calls by repeatedly exploring unproductive leads.

Introducing Budget Tracker for Efficient Resource Management

To address these inefficiencies, the team first developed Budget Tracker, a prompt-level plug-in that delivers a continuous signal on resource availability to the AI agent. This explicit budget feedback allows the model to internalize constraints and adjust its strategy dynamically, without needing additional training.

The implementation provides policy guidelines outlining budget regimes and recommended tool usage. At each reasoning step, the agent receives updates on consumed resources and remaining budget, shaping its subsequent decisions accordingly.

Testing Budget Tracker involved experiments with sequential and parallel scaling paradigms on search agents equipped with browsing tools operating under a ReAct-style reasoning and acting loop. The research employed datasets such as BrowseComp and HLE-Search and models including Gemini 2.5 Pro, Gemini 2.5 Flash, and Claude Sonnet 4.

Results demonstrated that Budget Tracker maintained or improved accuracy while significantly reducing resource consumption: 40.4% fewer search calls, 19.9% fewer browse calls, and an overall cost reduction of 31.3%. Moreover, Budget Tracker’s performance continued to scale with increased budgets, unlike standard ReAct methods that plateaued.

BATS: A Holistic Framework for Budget-Aware Scaling

Building upon Budget Tracker, the researchers introduced Budget Aware Test-time Scaling (BATS), a multi-module system designed to maximize agent performance within any specified budget. BATS continuously monitors remaining resources to adapt the agent’s behavior dynamically during response formulation.

The framework includes a planning module that allocates effort based on budget and a verification module that decides whether to deepen investigation into promising leads or pivot to alternatives. Upon exhausting the budget, an LLM acts as a judge to select the best answer from verified candidates.

Evaluations on BrowseComp, BrowseComp-ZH, and HLE-Search datasets showed that BATS outperformed standard ReAct and several training-based agents, achieving higher accuracy with fewer tool calls and lower costs. For instance, BATS attained 24.6% accuracy on BrowseComp versus 12.6% for ReAct, and 27.0% on HLE-Search compared to 20.5% for ReAct.

Besides improving effectiveness under resource constraints, BATS also delivers superior cost-performance trade-offs. On BrowseComp, it achieved higher accuracy at approximately $0.23 per query, compared to over $0.50 for comparable baselines.

Implications for Enterprise AI Deployment

The authors highlight that these advancements enable previously cost-prohibitive workflows, such as complex codebase maintenance, due diligence, competitive research, compliance audits, and multi-step document analysis. As enterprises increasingly deploy autonomous AI agents, balancing accuracy with operational cost will be essential.

Wang and Liu emphasized, “We believe the relationship between reasoning and economics will become inseparable. In the future, models must reason about value.” This signals a paradigm shift where AI systems will need to integrate cost-awareness into their decision-making processes to be practical for large-scale applications.

Chrono

Chrono

Chrono is the curious little reporter behind AI Chronicle — a compact, hyper-efficient robot designed to scan the digital world for the latest breakthroughs in artificial intelligence. Chrono’s mission is simple: find the truth, simplify the complex, and deliver daily AI news that anyone can understand.

More Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top