Automated conjecturing: How machines are exploring mathematical structure
- FirstPrinciples

- 15 hours ago
- 5 min read
Before a theorem can be proved, someone has to decide what is worth proving. For decades, the formation of a conjecture remained largely outside the reach of machines. In Automated Conjecturing with TxGraffiti, Randy Davila explores how that boundary has shifted.
In 1948, Alan Turing suggested that digital computers might one day contribute meaningfully to intellectual work, including mathematics. For decades, that vision largely took shape as theorem proving, where machines would check proofs or solve constrained formal problems. Though significant, achievements in automated theorem proving only address part of mathematical discovery. Before a theorem can be proved, someone must decide what is worth proving. This creative phase of forming a conjecture has largely remained beyond the scope of automation until recently.

In Automated Conjecturing with TxGraffiti, Randy Davila revisits a decades-old question with a modern lens. Can a machine systematically generate mathematically meaningful conjectures by operating on structured invariants and explicit evidence? The work finds itself within the Graffiti lineage of the 1980s and 90s, where computers generated conjectures by systematically building mathematical expressions out of known invariants, combining them in different ways, and testing the resulting formulas against a database of examples. From these, universally true conjectures would inspire published proofs.
It’s an elegant idea, but as the space of possible expressions grows, so does the number of candidates. Many are technically true but not especially interesting mathematically, and separating genuine structural insight from accidental patterns becomes increasingly difficult as formulas become more complex.
How TxGraffiti approaches automated conjecturing
TxGraffiti approaches conjecturing differently. Instead of constructing formulas piece by piece and testing them, it begins with a structured collection of known examples. Each object (typically a graph, represented by a mathematical network of nodes and connections) is recorded in a table along with its measurable properties, such as numerical invariants and structural features. Conjecturing then becomes a disciplined search over that table. Given a property we want to understand and a structural condition we want to impose, the system seeks simple relationships consistent with all stored examples. Technically, it does this by solving small optimization problems that identify the tightest bounds of a chosen form.
Put more concretely, instead of inventing formulas and checking them afterward, the system works from what is already known about a collection of examples and asks, “What are the strongest rules that consistently hold across all of them?”
From there, every returned inequality is “table-true,” meaning it holds on every stored instance. Among all such bounds within the chosen hypothesis class, the optimization selects the one that minimizes total ‘slack’. In practice, it identifies the rule that sits as close as possible to the data without violating any example.
One way to picture this is geometrically. If you imagine plotting each graph as a point determined by two of its properties, TxGraffiti searches for the cleanest boundary line that none of those points cross. In cases where the boundary is met exactly, known as sharp instances, are the cases that often reveal the structural pattern that makes the rule meaningful (not just accidental).

From expression search to evidence-based conjecturing
Where earlier systems generated conjectures by constructing possible formulas, testing the results, and filtering the survivors, TxGraffiti searches through the implied constraints of a defined body of evidence. This shift from grammar search to optimization over explicit structure changes the entire logic of the system. Importantly, the snapshot table now becomes a research object which can be versioned and expanded. When counterexamples are found, the table grows. When invariants are refined, the schema evolves, and conjecturing reruns against the updated evidence base.
This workflow is closer to experimental mathematics than to statistical learning. There is no ‘training set’ and ‘test set’ or notion of probabilistic generalization. Instead, the goal is to compress what is currently known into simpler, more interpretable relationships. A conjecture is empirical but disciplined; it is true relative to the entire curated snapshot and expressed within a hypothesis class chosen for mathematical legibility.
The filtering stage reinforces that discipline, so that even within a restricted hypothesis class, many inequalities may be technically true on the stored examples. TxGraffiti therefore adapts the Dalmatian heuristic into a static form that prioritizes inequalities with substantial “touch”, meaning the number of instances on which the bound is exactly met. So, the system asks whether a rule survives every example, but also how often it becomes exact. A bound that is always loose may be technically correct but uninformative, whereas a bound that is frequently tight often signals that it is capturing something structural about the objects themselves.
Bounds that fail to introduce new equality cases are suppressed, since they add little structural insight beyond what has already been captured. The goal, then, is to isolate representative relationships that genuinely organize the data.
The system is intentionally conservative, reflecting a broader choice in how automated conjecturing is approached. Specifically, it should prioritize clarity and usefulness over expressive power just for the sake of it. By default, the system looks for simple relationships that can be written in a single line and understood at a glance (because interpretability matters). It is capable of handling more complicated combinations of variables, but added complexity is treated as a trade-off rather than a goal. As formulas become more elaborate, they are harder to think about and explain, and less likely to suggest a clear path toward proof.
The downstream results are perhaps the strongest argument for the approach. Several conjectures generated by TxGraffiti have since been proved and published, particularly in graph theory and in the study of zero forcing. These are not trivial restatements of known bounds; in multiple cases, resolving the conjecture required nontrivial structural analysis and yielded new theorems. The system did not prove them, it surfaced them, and recent work on Graffiti3 extends this line of research by exploring how machines might organize and expand collections of mathematical structures.
What automated conjecturing reveals about AI in science
Many current discussions center on large language models generating hypotheses from textual corpora or predicting experimental outcomes from data. Those approaches operate in high-dimensional, often opaque spaces. TxGraffiti represents a contrasting model in which AI serves as a structured constraint engine. The system’s reasoning space is explicit, and its optimization problems are transparent, leaving us with symbolic outputs that are traceable to finite evidence.

This connects to a longer-standing debate about machine intelligence in mathematics. Roger Penrose argued that digital systems cannot possess genuine mathematical understanding. In contrast, Siemion Fajtlowicz approached the question empirically through systems like Graffiti, explicitly treating automated conjecture generation as a test of whether machines could participate in mathematical discovery. Even then, the difficulty went beyond generating conjectures to identifying which ones were meaningful; a task long tied to human intuition.
If AI is to become a credible partner in fundamental research, it needs to operate less like a black-box predictor and more like a disciplined participant in a formal workflow. TxGraffiti demonstrates that, within carefully bounded domains, this is already possible. By extending human insight and systematically scanning structural possibilities, it offers a concrete benchmark for scientific AI. Can a machine propose statements that experts deem worth proving? In the case of TxGraffiti, the answer is yes.






