:
[AI]■ STORY TIMELINE

BERKELEY RESEARCHERS BREAK TOP AI AGENT BENCHMARKS

Berkeley's RDI team demonstrated critical flaws in leading AI agent benchmarks, achieving near-perfect scores by exploiting structural weaknesses rather than improving actual AI capabilities.

1 SOURCEFIRST SEEN APR 11, 07:15 PM► READ THE ARTICLE
Hacker News+0m

Article URL: https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/ Comments URL: https://news.ycombinator.com/item?…