探索AI在物理問題解決中「想像力」的潛力

Hacker News

大約 1 個月前

AI 生成摘要

名為Foresight的研究原型，旨在探索AI是否能透過生成未來影片預測並與現實比對，來提升其推理能力，類似於人類的心智模擬。該專案發現目前模型尚無法達成此目標，但建議將此作為未來AI發展的基準。

GitHub - a1j9o94/foresight: Research prototype exploring AI video prediction for improved reasoning

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

To see all available qualifiers, see our documentation.

Research prototype exploring AI video prediction for improved reasoning

a1j9o94/foresight

Folders and files

Latest commit

History

Repository files navigation

Foresight

Can AI benefit from "imagining" the future before making decisions?

This research project explored whether AI systems could improve their reasoning by generating video predictions and checking them against reality—like how humans often visualize outcomes before acting.

Result: Not yet. Current models aren't capable of this, but we've documented what works, what doesn't, and propose tracking this as a benchmark for future AI systems.

The Idea (Plain English)

When you're about to pour coffee, you might briefly imagine the liquid filling the cup. If you imagined it overflowing, you'd pour less. This "mental simulation" helps you make better decisions.

We asked: Can AI do something similar?

The plan was:

If this worked, AI systems could catch their own mistakes by noticing when their predictions look wrong—just like you'd notice if your mental image of pouring coffee showed it going sideways instead of down.

What We Found

Summary Table

Detailed Results

The Key Failures

VLMs Can't Predict the Future

We tried 7 different approaches to make the language model predict what happens next in a video:

All of them performed worse than simply copying the current frame as the "prediction." The language model understands what's in an image, but it cannot predict what will change.

Visual Similarity ≠ Semantic Correctness

Even when we used a video model to generate predictions (which looked reasonable), comparing them to reality using perceptual metrics (LPIPS) didn't help. Surprisingly, wrong predictions often looked MORE similar to reality than correct ones.

This means you can't use "does it look right?" to catch mistakes—the visual appearance doesn't indicate whether the prediction is semantically correct.

What Did Work

Despite the negative results, we made useful discoveries:

Benchmark Proposal: VideoReason

We're releasing this as a benchmark to track when this approach becomes viable. As video models improve, these capabilities may emerge.

Tasks to track:

Why track this? Video generation is improving rapidly. The capabilities we found lacking in 2026 may emerge in future systems. A standardized benchmark helps identify when "visual imagination" becomes useful for AI reasoning.

Prerequisites

To reproduce the experiments, you'll need accounts with these services:

Dataset: Something-Something v2

The experiments use the Something-Something v2 dataset for action prediction. This must be downloaded manually:

The dataset contains ~220K videos of humans performing 174 different actions (pushing, pulling, dropping, etc.).

Exploring AI's Capacity for 'Imagination' in Physics Problem Solving

GitHub - a1j9o94/foresight: Research prototype exploring AI video prediction for improved reasoning

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a1j9o94/foresight

Folders and files

Latest commit

History

Repository files navigation

Foresight

The Idea (Plain English)

What We Found

Summary Table

Detailed Results

The Key Failures

What Did Work

Benchmark Proposal: VideoReason

Prerequisites

Dataset: Something-Something v2

Setup

Quick Start

Live Demo

Project Structure

Tools & Models Used

Documentation

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors

Uh oh!

Languages

Footer

Footer navigation