The AI Physicist reaches its first autonomy milestone (ahead of schedule)
- FirstPrinciples
- Dec 9
- 5 min read
The team at FirstPrinciples has reached the AI Physicist’s first autonomy milestone ahead of schedule, assembling the early loops of a system that will guide its own scientific reasoning from research to hypothesis and beyond.
Every scientist knows how reading a single paper can set off a chain reaction of new questions. One question sends you to another paper, then another. You start to see a pattern, and you form a hunch. You test it, and the test reveals a gap you hadn't noticed, so you go back to reading. This recursive dance between learning and hypothesizing is the engine of discovery. It's also what we've been working to build into the AI Physicist.
November was a milestone month at FirstPrinciples. With several new hires joining both the technical and fundraising teams, we’ve been moving quickly toward a goal we set for ourselves earlier this year, which was to develop a minimum viable autonomous AI Physicist by the end of 2025. The good news is, we’re ahead of schedule.
It’s important to note that this first version of the AI Physicist isn’t a polished product with profound scientific outputs. We were aiming for a rough but fully connected system that can move through the arc of scientific reasoning on its own, building for breadth instead of depth. Think of it as the first flicker of a feedback loop; the earliest trace of a system that can begin to perform that recursive dance without us. Over time, these loops will grow into something far more capable.
What early autonomy looks like for the AI Physicist
At this stage, autonomy isn’t a single capability. It’s a chain of self-directed loops that, when connected, form the scaffolding of scientific reasoning.

The path the AI Physicist takes is not an unfamiliar one. Observation leads to research, which leads to question formulation, then evaluation, tool routing, and hypothesis formation, with frequent loops back when needed. Each part is straightforward on its own. The autonomy emerges from how they connect.
The scientific process begins
Every run starts where good science always does, by looking and learning in the Observation and Research stage. Given an initial prompt, the system launches a Deep Literature Search across the web and our internal knowledge graph of roughly 2.5 million academic papers. The Question Formulator then composes questions based on the returned results to progress along the scientific process.
The goal at this stage isn't depth. It's surveying the terrain and trying to understand what is known, what remains unclear, and what might matter next.
The evaluator: building the soul of science into the AI Physicist
The Evaluator module sits at the center of the AI Physicist, governing whether to continue a line of research or loop back for deeper investigation. It scores questions for context alignment, completeness, and effectiveness. Over time, it will evolve to contain increasingly complex determination logic (including gap analyses, anomaly detection, and novelty assessment) to steer each run with increasing sophistication.
An interesting takeaway from this implementation was that, because LLMs are fundamentally designed to be helpful, the Evaluator’s initial scores came back with (unsurprisingly) overwhelming optimism. The problem is that science is not built on optimism. Science is built on doubt.
Our emerging solution is the development of Skeptic agents. These are adversarial components whose entire purpose is to push scores down, to poke holes, to ask "are you sure?" They would force the Evaluator to defend its judgments before moving forward.
The right questions require the right tools
Once a research idea is deemed worthy of further investigation, something interesting happens. The system starts making its own decisions about how to proceed.
A musician doesn't think about which instrument to pick up; they reach for the one the song requires. We wanted the AI Physicist to develop something similar: an instinct for which tool fits the problem at hand.
Our master model performs Orchestration behind the scenes, determining which tools and specialized models are best suited to answer the current question. For example, questions specific to quantum mechanics would be routed through our domain-specialized quantum mechanics model.
Today, this routing is simple. But it represents a meaningful shift. The system is no longer waiting for instructions. It's selecting its own instruments based on the problem it's trying to solve.
The backbone of scientific autonomy
From the information gathered during research, observation, and evaluation, the AI Physicist moves into hypothesis formation, attempting to synthesize what it has learned into a coherent idea. Hypothesis formation rarely proceeds in a straight line. More often, it exposes new gaps that send the system back to research. A half-formed idea raises a question that the literature search missed. A promising thread dead-ends, and the system pivots. This back-and-forth motion is the backbone of scientific autonomy. It provides both the ability to press forward when the ground is solid and to retreat when it isn't.
Giving the AI Physicist its own Notebook
Before pushing beyond hypothesis formation, we confronted a question that has no technical shortcut. How do you make an AI system's reasoning legible?
Scientific credibility depends on transparency. It's not enough to produce a result; you have to show your work. Other researchers need to trace your steps, probe your assumptions, and reproduce your findings. This is what separates science from alchemy.
Even in early runs, the AI Physicist's internal process became difficult to follow. Dozens of questions branch into sub-questions. Evaluation scores shift. Tools get called, and results get folded back in. Without a record, the reasoning becomes a black box.
Every scientist who's ever lost a crucial insight knows the sinking feeling. Was it in a notebook somewhere or in a file that didn't get saved? The notebook, whether physical or digital, is how thinking becomes auditable.
So we gave the AI Physicist the tool that every scientist carries: a notebook of its own.
The Notebook serves as a curated stream of system logs that captures questions considered, paths taken, tools used, decisions made, and conclusions reached. This way, the AI Physicist’s process is transformed from an opaque internal process into a human-readable record of the ‘thinking’ behind the outputs.

The AI Physicist: Looking forward
If this month’s progress has a takeaway, it's that building the AI Physicist is inherently meta. We are using the scientific process to build a system that uses the scientific process.
The entire experience forces a deeper appreciation for both the structure and curiosity that science demands. The work keeps returning us to the fundamentals of transparency, healthy skepticism, and the simple act of writing things down. These are the same habits scientists have relied on for centuries.
As the AI Physicist grows more capable, those fundamentals will remain the anchor for us, and for the AI Physicist that we’re teaching to reason. It may just turn out that the best way to teach a system to reason like a scientist, is to remember what it means to be one.



