New Benchmark Reveals Large Language Models Still Struggle with Genuine Scientific Research
Despite excelling at exams, large language models (LLMs) have yet to demonstrate the ability to conduct authentic scientific research, a new study finds.
Despite excelling at exams, large language models (LLMs) have yet to demonstrate the ability to conduct authentic scientific research, a new study finds.
Top executives at Salesforce have indicated a notable decrease in trust towards large language models (LLMs) this year, reflecting growing concerns about AI reliability and practical application in business environments.
Google introduces Gemini 3 Flash, a large language model that balances cutting-edge intelligence with reduced costs and latency, enabling enterprises to deploy faster and more cost-efficient AI applications.
As AI technology becomes widely accessible, enterprises are prioritizing grounded AI models anchored in trusted institutional knowledge to ensure accuracy and trustworthiness in critical consulting projects, exemplified by SAP’s Joule for Consultants.
OpenAI has officially launched its GPT-5.2 model, marking a significant update aimed at enhancing performance and maintaining competitiveness against Google’s Gemini 3 and Anthropic’s latest AI offerings.
Cloudflare has reported that Perplexity, an AI-driven platform, continued to crawl and scrape websites despite explicit technical measures implemented to prevent such activity.
Motif Technologies, a Korean AI startup, has unveiled critical insights into training enterprise large language models (LLMs), emphasizing data alignment, infrastructure design, and reinforcement learning stability. Their new open-weight model outperforms leading benchmarks, including OpenAI’s GPT-5.1.
Researchers at the University of Luxembourg have subjected ChatGPT, Gemini, and Grok to psychiatric evaluations treating them as therapy patients, revealing alarming pathological scores and fabricated trauma narratives that raise critical questions about AI safety and the risks of anthropomorphizing AI.
Only a month after unveiling GPT-5.1, OpenAI has introduced GPT-5.2, achieving significant improvements in AI benchmarks and overtaking Google’s Gemini 3 in the competitive landscape.
A comprehensive study by OpenRouter analyzing over 100 trillion tokens reveals surprising realities about AI usage, challenging common assumptions about AI’s role in productivity, programming, and creative applications.