論文中的指令可操縱AI審稿人，成功率達78%-86%

Hacker News

大約 1 個月前

AI 生成摘要

一項針對學術同行評審的研究發現，隱藏在論文中的指令能夠成功操縱AI審稿人（如ChatGPT和Gemini），在78%至86%的案例中影響其輸出和建議。此研究突顯了AI輔助研究評估中存在的重大漏洞。

How to get your paper accepted by an AI reviewer: indirect prompt injection in peer review | Research Square

Cite

Research Article

How to get your paper accepted by an AI reviewer: indirect prompt injection in peer review

Federico Torrielli, Stefano Locci, Amon Rapp, Luigi Di Caro

This is a preprint; it has not been peer reviewed by a journal.

https://doi.org/10.21203/rs.3.rs-8432945/v1

This work is licensed under a CC BY 4.0 License

Status:

Posted

Version 1

posted

You are reading this latest preprint version

Abstract

The growing use of large language models to assist or automate academic peer review raises fundamental questions about the validity and robustness of algorithmically mediated research evaluation. This study introduces the Author-Reviewer-Organizer (ARO) framework, which models peer review as a strategic interaction among authors, reviewers, and organizers with distinct incentives and capacities to exploit or constrain AI-based evaluation. Within this framework, we present a large-scale empirical assessment of indirect prompt injection, a vulnerability that allows hidden instructions embedded in a manuscript to influence an AI reviewer's output without the reviewer's awareness. Using 5,600 controlled experiments on manuscripts from NeurIPS and ICLR published before November 2022, prior to the widespread public availability of high-capability LLMs, we evaluate the susceptibility of two widely used, general-purpose LLM-based chatbot systems employed for review assistance under multiple injection strategies. We find that hidden instructions are followed in 78% of cases for ChatGPT and 86% for Gemini, substantially exceeding success rates reported in prior prompt-injection studies. Manipulation can reliably steer review sentiment and acceptance recommendations, while the same mechanism can be repurposed by organizers for defensive purposes, including watermarking and detection of AI-generated reviews. Instruction placement within the document significantly affects outcomes, with early-position payloads consistently exerting greater influence. By situating these results within the ARO framework, we show that AI-assisted peer review introduces document-level structural vulnerabilities that undermine evaluative reliability. The results have direct implications for the use and governance of LLMs in peer review, research assessment, and other gatekeeping processes central to scientometric analysis and science policy.

JEL Classification: O33 , D82 , D83 , L86

MSC Classification: 68M25 , 68T50 , 68T01

AI-assisted peer review

Indirect prompt injection

Large language models

Scientific integrity

AI safety

Full Text

Additional Declarations

No competing interests reported.

Cite

Status:

Posted

Version 1

posted

You are reading this latest preprint version

Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal.

As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing.