Between tools and theory: Reflections from the Machine Learning and the Physical Sciences workshop, NeurIPS 2025
- FirstPrinciples

- 1 day ago
- 5 min read
A reflection on ML4PS 2025, where researchers in physics and machine learning grappled with the role AI should play in scientific discovery, and what it would take to move from process acceleration toward deeper scientific insight.
Since its first edition in 2017, the Machine Learning and the Physical Sciences (ML4PS) workshop has occupied a distinct place within the NeurIPS ecosystem. It is intended as a space where researchers from physics and machine learning can surface open questions and reflect on how the two fields are evolving together, rather than to settle debates or prescribe a single direction for moving forward. This year’s posters and panels spanned a wide technical range, but what stood out was a community actively examining its own assumptions about the role AI should play in scientific work.

Academia, industry, and the shape of collaboration
That self-examination was especially visible in the workshop’s focus on the evolving relationship between academia and industry. As Garrett Merz, one of the workshop organizers, explained to FirstPrinciples, industry labs are now able to train much larger models than most academic institutions and have become deeply interested in applications across the physical sciences. “The potential for collaboration is immense,” he noted, pointing to areas like biophysics and materials discovery, among others. At the same time, organizers acknowledged real concerns within academia. Specifically, that physics could become a proving ground for large generalist models before the costs and tradeoffs of automation are fully understood, and that shifting incentives may quietly reshape scientific priorities.
Rather than framing this as a conflict, the ML4PS organizers emphasized the need for ongoing, open discussion. Some researchers are eager to collaborate with industry, whether through frontier language models or jointly developed domain-specific systems. Others are more cautious, worried about value shifts and misalignment. “There are no easy answers here,” Garrett stressed. ML4PS, in his view, is meant to be a place where these disagreements can be explored honestly, without forcing premature consensus.
Scientific rigor in a world that scales
Questions about rigor and interpretability ran in parallel. While machine learning continues to scale rapidly, physical science places strong demands on reproducibility and scientific rigor. Workshop discussions reflected a pragmatic stance. As Garrett put it, “It's not necessarily true that rigor and fast-scaling are in competition. There are many problems where interpretability is secondary to having a working black-box model simply due to the fact that they occur in a challenging statistical regime, while there are other problems where a small-scale, highly-interpretable model is required.”
Navigating these choices, he stated, depends on frequent communication between domain scientists and ML researchers. This is especially true when complex scientific goals are translated into benchmark-style objectives.
AI as a tool, not a participant
Alexander Tessier, Director of Engineering at FirstPrinciples, attended ML4PS 2025 as part of his time at the broader NeurIPS conference. “The workshop highlighted a lot of applications of AI in the physical sciences, from cosmology to control of large physical apparatuses,” he said. Across many of these examples, machine learning clearly functioned as a powerful accelerator for well-defined tasks.
At the same time, Tessier noted that most approaches he encountered were tightly focused on specific problems. “There were no cohesive or unified approaches, beyond perhaps Tess Schmidt’s excellent presentation on symmetry and equivariance,” he observed, emphasizing that this was not a criticism of individual contributions but a reflection of how early and exploratory much of the work still is.
Importantly, this early and exploratory character is intentional. ML4PS is designed as a venue for work in progress, and some contributions are beginning to explore more unifying approaches, even as the field continues to sort out how such questions should be addressed. What remains open, in Alex’s view, is how such efforts might eventually connect. “We need to have an ‘AlphaFold moment,’” he said; one driven by new architectures capable of uncovering latent physical structure.

That concern connects to a broader question about understanding versus performance. Tessier pointed out that many systems treated AI primarily as a way to speed up existing processes. “AI seemed to be very much a tool,” he said, “but often did not have a seat at the table when it came to new discoveries and methods.” The distinction between acceleration and insight surfaced repeatedly in informal conversations throughout the workshop.
Where generality breaks
Debates about generality and specialization provided another lens on this issue. Should scientific AI aim to be broadly general, or carefully tailored to specific domains? The organizers of ML4PS emphasized that the answer depends heavily on the problem. “Physics data” can mean waveforms, spectra, images, or point clouds, each with its own structure. In some areas, foundation models are beginning to perform well across tasks; in others, the data remains too specialized for easy transfer. In panel discussions, this diversity consistently shifted the conversation away from model scale alone and toward a more practical concern. That is, knowing when a model is appropriate for a given scientific question.
Tessier found this diversity reinforcing rather than discouraging. The breadth of applications and data types on display, he said, mirrored the reality of physics. While large language models often aim for unification, he noted that transformer-based systems at ML4PS were most often used to solve specific subproblems, not to replace domain expertise wholesale. To him, this supported the idea that domain specialization is not a temporary limitation, but a necessary feature of trustworthy scientific AI.
Shared bottlenecks, shared work
Several research directions discussed at ML4PS pointed toward shared technical bottlenecks. Tessier highlighted posters on agent-based approaches, symbolic regression, and reduced-order modelling, as well as recurring challenges around tool-informed reasoning and long-horizon learning in mathematically demanding settings. “Most of these topics would benefit from collaboration,” he said, “which we need to explore as a community more holistically.”
Openness, reproducibility, and scientific trust
Questions of openness and reproducibility formed another consistent thread. The organizers expressed strong support for open-source code and data, and for incentives that reward reproducible work. This emphasis aligns with how FirstPrinciples thinks about scientific infrastructure. Specifically, that openness is essential to building trustworthy scientific systems.
If there was a recurring theme across ML4PS 2025, it was an awareness of uncertainty about what progress should look like. In one panel, sponsored by FirstPrinciples, organizers reflected that an eventual “AlphaFold moment” in fundamental physics may depend less on automation than on learning how to ask better questions in the first place.
Workshops like ML4PS have become increasingly important spaces for this kind of reflection. As funding pressures grow and institutional boundaries blur, they offer a venue for careful discussion about how new tools are changing scientific practice. ML4PS 2025 did not present a single vision for AI in the physical sciences. Instead, it made visible a community working through hard questions about responsibility, and what it would mean for AI to move from a useful tool toward a deeper role in scientific understanding.






