frontiermath news - 搜索 News

2 天on MSN

IT之家 11 月 15 日消息，研究机构 Epoch AI 现公布了一款名为 FrontierMath 的全新 AI 模型数学基准测试集，旨在评估系列模型的数学推理能力。与现有诸如 GSM-8K、MATH 等测试题集不同，FrontierMath ...

VentureBeat7 天

AI’s math problem: FrontierMath benchmark shows how far technology still has to go

A groundbreaking new benchmark, FrontierMath, is exposing just how far today’s AI is from mastering the complexities of higher mathematics. Developed by the research group Epoch AI, FrontierMath ...

Digital information world4 天

Beyond Simple Math, AI Hits a Wall—FrontierMath Shows Where It’s Stuck

A new benchmark called FrontierMath is exposing how artificial intelligence still has a long way to go when it comes to ...

6 天

Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

Epoch AI highlighted that to measure AI's aptitude, benchmarks should be created on creative problem-solving where the AI has ...

6 天

New secret math benchmark stumps AI models and PhDs alike

FrontierMath's performance results, revealed in a preprint research paper, paint a stark picture of current AI model ...

腾讯网7 天

AI数学神话破灭！FrontierMath让LLM集体几乎“交白卷”：正确率不超过2%

然而，Epoch AI看不下去了，联手60多位顶尖数学家，憋了个大招——FrontierMath，一个专治LLM各种不服的全新数学推理测试！结果惨不忍睹，LLM集体“翻车”，正确率竟然不到2%！🤡 看看Epoch AI是怎么做的 ...

3 天on MSN

全新AI数学基准测试集FrontierMath出炉：现有模型难以应对复杂数学挑战

【ITBEAR】研究机构 Epoch AI 近日发布了一款全新的 AI 模型数学基准测试集，名为 FrontierMath。该测试集旨在全面评估 AI 模型的数学推理能力，尤其是面对复杂数学问题时的表现。与现有的数学测试题集如 GSM-8K 和 ...

7 天

陶哲轩携手数十数学家推出FrontierMath，AI数学挑战成功率仅2%

据EpochAI的研究报告显示，这六个前沿模型在FrontierMath的表现尤其令人震惊，它们的成功率竟低于2%。OpenAI的研究科学家Noam Brown对此表示赞赏，认为这种低通过率显示了当前AI在数学处理方面的局限性。这一结果呼应了广泛存在的质疑：虽然许多大型语言模型（LLM）看似在处理数学问题上表现出色，但它们的能力常常被夸大。

marktechpost10 天

FrontierMath: The Benchmark that Highlights AI’s Limits in Mathematics

Meet FrontierMath: a new benchmark composed of a challenging set of mathematical problems spanning most branches of modern mathematics. These problems are crafted by a diverse group of over 60 expert ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果