SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via CI
Hacker News
This research introduces SWE-CI, a new benchmark designed to evaluate the performance of AI agents in software engineering tasks by leveraging continuous integration environments.
SWE-CI:透過持續整合評估 AI 代理維護程式碼庫的能力
Hacker News
1 天前
AI 生成摘要
本研究介紹了 SWE-CI,這是一個旨在透過持續整合環境評估 AI 代理在軟體工程任務中表現的新型基準測試。