newsence
來源篩選

Announcing ControlConf 2026

Lesswrong

We’re running ControlConf in Berkeley on April 18-19. It's a two-day conference on AI control: the study of reducing risks from misalignment through safeguards that work even when AI models are trying to undermine them. Since the last ControlConf (Feb 2025), AI agents have gotten way better. We’re approaching the point where control techniques are load-bearing for the safety of real agent deployments. There’s been a lot of progress too. Researchers have run control evaluations in more realistic settings, and AI companies have started building initial control measures—with much more to come. At ControlConf, the people at the frontier of this work will present on current research problems, promising interventions, and the most important research directions going forward. Apply here . We’re also taking the opportunity to run a one-day workshop on April 17 on AI futurism and threat modeling, aimed at people who want clearer models of catastrophic AI risks and the best strategies for mitigating them. Apply to that here . Discuss

newsence

2026 年 AI 控制大會(ControlConf 2026)正式發佈

Lesswrong
3 天前

AI 生成摘要

我們將於 4 月 18 日至 19 日在柏克萊舉辦 ControlConf,這是一場為期兩天的 AI 控制會議,探討如何透過防護措施降低失控風險。我們也將於 4 月 17 日舉辦為期一天的 AI 未來主義與威脅建模工作坊,旨在為想要更清晰了解災難性 AI 風險模型的人士提供協助。

我們將於 4 月 18 日至 19 日在柏克萊舉辦 ControlConf。這是一場為期兩天的 AI 控制(AI control)研討會:研究如何透過防護措施來降低失控(misalignment)風險,且這些措施即使在 AI 模型試圖破壞它們時依然有效。

自上一屆 ControlConf(2025 年 2 月)以來,AI 代理(AI agents)已大幅進步。我們正接近一個轉折點,即控制技術對於實際代理部署的安全至關重要。目前也已取得許多進展:研究人員在更真實的情境中進行了控制評估,AI 公司也開始建立初步的控制措施——未來還會有更多進展。

在 ControlConf,處於該領域前沿的人員將針對當前的研究問題、具前景的干預措施,以及未來最重要的研究方向進行發表。

我們也藉此機會在 4 月 17 日舉辦為期一天的 AI 未來主義與威脅建模工作坊,對象是希望對 AI 災難性風險有更清晰模型,並尋求最佳緩解策略的人士。請點擊申請。