GitHub - a1j9o94/foresight: Research prototype exploring AI video prediction for improved reasoning
Navigation Menu
Search code, repositories, users, issues, pull requests...
Provide feedback
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly
To see all available qualifiers, see our documentation.
Research prototype exploring AI video prediction for improved reasoning
a1j9o94/foresight
Folders and files
Latest commit
History
Repository files navigation
Foresight
Can AI benefit from "imagining" the future before making decisions?
This research project explored whether AI systems could improve their reasoning by generating video predictions and checking them against reality—like how humans often visualize outcomes before acting.
Result: Not yet. Current models aren't capable of this, but we've documented what works, what doesn't, and propose tracking this as a benchmark for future AI systems.
The Idea (Plain English)
When you're about to pour coffee, you might briefly imagine the liquid filling the cup. If you imagined it overflowing, you'd pour less. This "mental simulation" helps you make better decisions.
We asked: Can AI do something similar?
The plan was:
If this worked, AI systems could catch their own mistakes by noticing when their predictions look wrong—just like you'd notice if your mental image of pouring coffee showed it going sideways instead of down.
What We Found
Summary Table
Detailed Results
The Key Failures
- VLMs Can't Predict the Future
We tried 7 different approaches to make the language model predict what happens next in a video:
All of them performed worse than simply copying the current frame as the "prediction." The language model understands what's in an image, but it cannot predict what will change.
- Visual Similarity ≠ Semantic Correctness
Even when we used a video model to generate predictions (which looked reasonable), comparing them to reality using perceptual metrics (LPIPS) didn't help. Surprisingly, wrong predictions often looked MORE similar to reality than correct ones.
This means you can't use "does it look right?" to catch mistakes—the visual appearance doesn't indicate whether the prediction is semantically correct.
What Did Work
Despite the negative results, we made useful discoveries:
Benchmark Proposal: VideoReason
We're releasing this as a benchmark to track when this approach becomes viable. As video models improve, these capabilities may emerge.
Tasks to track:
Why track this? Video generation is improving rapidly. The capabilities we found lacking in 2026 may emerge in future systems. A standardized benchmark helps identify when "visual imagination" becomes useful for AI reasoning.
Prerequisites
To reproduce the experiments, you'll need accounts with these services:
Dataset: Something-Something v2
The experiments use the Something-Something v2 dataset for action prediction. This must be downloaded manually:
The dataset contains ~220K videos of humans performing 174 different actions (pushing, pulling, dropping, etc.).
Setup
Quick Start
Live Demo
Project Structure
Tools & Models Used
Documentation
Citation
If you use this work, please cite:
License
Research prototype - released for academic use. See paper for full methodology.
About
Research prototype exploring AI video prediction for improved reasoning
Resources
Uh oh!
There was an error while loading. Please reload this page.
Stars
Watchers
Forks
Releases
Packages
0
Contributors
2
Uh oh!
There was an error while loading. Please reload this page.
Languages
Footer
Footer navigation