frontiermath news - Search News

AI’s math problem: FrontierMath benchmark shows how far technology still has to go

A groundbreaking new benchmark, FrontierMath, is exposing just how far today’s AI is from mastering the complexities of higher mathematics. Developed by the research group Epoch AI, FrontierMath ...

Ars Technica6d

New secret math benchmark stumps AI models and PhDs alike

On Friday, research organization Epoch AI released FrontierMath, a new mathematics benchmark that has been turning heads in the AI world because it contains hundreds of expert-level problems that ...

腾讯网7d

AI数学神话破灭！FrontierMath让LLM集体几乎“交白卷”：正确率不超过2%

然而，Epoch AI看不下去了，联手60多位顶尖数学家，憋了个大招——FrontierMath，一个专治LLM各种不服的全新数学推理测试！结果惨不忍睹，LLM集体 ...

腾讯网3d

LLM 数学基准测试集 FrontierMath 公布：号称业界模型均败北

IT之家 11 月 15 日消息，研究机构 Epoch AI 现公布了一款名为 FrontierMath 的全新 AI 模型数学基准测试集，旨在评估系列模型的数学推理能力。

36氪8d

陶哲轩联手60多位数学家出题，世界顶尖模型通过率仅2%，专家级数学基准，让AI再苦战数年

FrontierMath基准测试揭示AI数学推理限。成功率低于2%，数学AI仍需突破。【导读】Epoch AI推出数学基准FrontierMath，目前前沿模型测试成功率均低于2%！

36氪7d

o1/Claude集体翻车，陶哲轩等60+顶尖数学家合力提出新数学基准，大模型正确率通通不足2%

所以，新挑战者到底啥来头？？一打听，这个新数学基准名为FrontierMath，由Epoch AI这家非营利研究机构号召陶哲轩在内的60多位顶尖数学家提出。

Some results have been hidden because they may be inaccessible to you

Show inaccessible results