METR AI基準：釐清時間跨度的局限性

Hacker News

大約 1 個月前

AI 生成摘要

這篇來自Hacker News AI的文章探討了METR AI基準的時間跨度測量，承認了批評和誤解。身為原始論文的主要作者之一，作者旨在釐清該方法的局限性以及有證據支持的結論。

Clarifying limitations of time horizon - METRtwitter

Notes

Rough/unpolished research updates and speculation

Clarifying limitations of time horizon

In the 9 months since the METR time horizon paper (during which AI time horizons have increased by ~6x), it’s generated lots of attention as well as various criticisms. As one of the main authors, I often see various misinterpretations of our work. While I still believe in the core results, I believe that many people to some extent both overstate the precision of our time horizon measurements and draw conclusions I don’t think the evidence fully supports.

Therefore, I’d like to clarify some of my beliefs about limitations of our methodology and time horizon more broadly—and then clarify what I think are the key conclusions directly supported by our results.

Despite these limitations, what conclusions do I still stand by?

See e.g. DeepSeek R1 paper: https://arxiv.org/abs/2501.12948 ↩

METR AI Benchmark: Clarifying Limitations of Time Horizon

Clarifying limitations of time horizon - METRtwitter

Notes

Rough/unpolished research updates and speculation

Clarifying limitations of time horizon

Despite these limitations, what conclusions do I still stand by?