newsence
來源篩選

METR AI Benchmark: Clarifying Limitations of Time Horizon

Hacker News

This article from Hacker News AI discusses the METR AI Benchmark's time horizon measurements, acknowledging criticisms and misinterpretations. The author, a lead author of the original paper, aims to clarify the methodology's limitations and the evidence-supported conclusions.

newsence

METR AI基準:釐清時間跨度的局限性

Hacker News
大約 1 個月前

AI 生成摘要

這篇來自Hacker News AI的文章探討了METR AI基準的時間跨度測量,承認了批評和誤解。身為原始論文的主要作者之一,作者旨在釐清該方法的局限性以及有證據支持的結論。

Clarifying limitations of time horizon - METRtwitter

Image Image

Notes

Rough/unpolished research updates and speculation

Image

Clarifying limitations of time horizon

In the 9 months since the METR time horizon paper (during which AI time horizons have increased by ~6x), it’s generated lots of attention as well as various criticisms. As one of the main authors, I often see various misinterpretations of our work. While I still believe in the core results, I believe that many people to some extent both overstate the precision of our time horizon measurements and draw conclusions I don’t think the evidence fully supports.

Therefore, I’d like to clarify some of my beliefs about limitations of our methodology and time horizon more broadly—and then clarify what I think are the key conclusions directly supported by our results.

Despite these limitations, what conclusions do I still stand by?

See e.g. DeepSeek R1 paper: https://arxiv.org/abs/2501.12948 ↩