AI rebuilds molecules from exploding fragments
SLAC researchers and collaborators trained a neural network that can use ion momentum to work backward and predict the pre-blast geometry of a molecule.
By Ula Chrobak
Key takeaways:
- A technique called Coulomb explosion imaging records the momentum of a molecule’s ions after they blast apart.
- Physicists can use the information to recreate the initial structure of the molecule, but the calculations are computing-intensive and slow.
- SLAC researchers and their collaborators trained an AI that predicts molecular geometries from post-explosion fragments, opening the possibility of applying the technique to more complex molecules.
Researchers at the Department of Energy’s SLAC National Accelerator Laboratory and collaborating institutions recently built a generative AI model that can recreate molecular structures from the movement of the molecule’s ions after they are blasted apart by X-rays, a technique called Coulomb explosion imaging.
The research, published in Nature Communications, is an important step toward being able to take snapshots of molecules during chemical reactions – an advance that could have important impacts in medicine and industry. The machine learning model closely predicted the geometries of a range of different molecules made of less than ten atoms, paving the way for applying the technique to larger molecules. “We were pretty excited about this,” said Xiang Li, an associate scientist at SLAC’s Linac Coherent Light Source (LCLS) and lead author of the study. “It is the first AI model built for molecular structure reconstruction from Coulomb explosion imaging.”
A new way to see molecules
Currently, there are limited options available for imaging isolated gas phase molecules. With electron microscopy, for example, subjects must be fixed in place, making it impossible to image free-floating molecules. And for diffraction-based techniques to work, the sample of molecules needs to be dense enough to generate a strong signal in the detector. The resulting image is technically an average of many molecules, restricting researchers from studying details only visible when imaging isolated molecules.
In the paper, the researchers instead focused on Coulomb explosion imaging. In this technique, an X-ray pulse hits a single molecule in a vacuum chamber, ripping off the molecule’s electrons. This leaves behind positive ions that explosively repel away from each other and smash into a detector. The detector captures their momentum, which can be used to reconstruct the structure of the molecule. “This technique has the ability to isolate minor details that are chemically relevant,” said James Cryan, LCLS interim deputy director for science, research and development, associate professor of photon science at SLAC and coauthor of the paper.
Physicist, Argonne National LaboratoryIt’s kind of like breaking a glass and trying to put it back together from how the pieces flew apart. Many problems in modern physics and chemistry involve reconstructing hidden structures from indirect measurements. This work demonstrates how AI can help tackle such inverse problems.
But this reconstruction process has so far been largely infeasible due to computing constraints. After the X-ray pulse strips away electrons, the remaining ions do not explode apart instantly. During this brief delay, the atoms can shift slightly, making it difficult to reconstruct the original structure using Coulomb's law for electrostatic forces. “It will not be accurate because a simple use of that law only works if the charge-up process is instantaneous,” explained Li.
Making things even messier, every additional atom in the molecule adds an exponential level of complexity. “It’s very challenging to work backwards to get the original structure,” said co-author Phay Ho, a physicist with DOE’s Argonne National Laboratory. “It’s kind of like breaking a glass and trying to put it back together from how the pieces flew apart. Many problems in modern physics and chemistry involve reconstructing hidden structures from indirect measurements. This work demonstrates how AI can help tackle such inverse problems.”
Machine learning for molecular structures
The research team set out to build a machine learning model that could overcome this computing constraint. They developed and trained the model at SLAC’s Shared Science Data Facility (S3DF). Generative AI models are well-suited for the task because they “think” differently than a standard computer simulation. Instead of working through a series of equations, they learn by finding patterns in training data. Then, they use those patterns to make statistical predictions.
Integrated Scientific and Data-intensive Computing
The SLAC Shared Science Data Facility (S3DF) is designed to support the data analytics needed for massive throughput observational and experimental data.
To gather training data, the team turned to a simulation built by Ho. The simulation analyzes molecular structures and calculates the momentum of their ions following a Coulomb explosion. After running for over a month, the computing-intensive simulation, using both quantum mechanics and classical physics equations, produced a dataset of 76,000 molecular samples.
Initially, the researchers trained the AI on this dataset alone, which is small by AI-training standards, and they found the model predicted inaccurate structures from explosion data. So, they re-did the training, adding in another dataset derived using only classical physics. The second set was less precise but about 100 times larger than the first one.
This two-step training was the trick for predicting precise structures.
The researchers tested the AI model by prompting it to predict molecular structures in a portion of the simulation data it had not seen in training. The model, which the team named MOLEXA (short for “molecular structure reconstruction from Coulomb explosion imaging”), took the ion momenta and calculated the most likely structures. “We found that this two-step training process suppressed the prediction error by a factor of two,” said Li.
The team then tested MOLEXA with experimental datasets recorded at the Small Quantum Systems (SQS) instrument of the European X-ray Free-Electron Laser facility (European XFEL) in Germany. The molecules they tested included water, tetrafluoromethane and ethanol. They entered the experimental ion momenta into the model, reconstructed the molecular structures, and then compared the reconstructions to known structures listed by the National Institute of Standards and Technology.
Associate Scientist, Linac Coherent Light Source, SLACIt is only a starting point for future research, which will not only improve model accuracy but also extend its applicability to larger molecular systems.
They found the predictions largely overlapped with the established structures. Overall, the bonds were in the right spots, with only slight variations in their angles. The errors in position were generally less than half the length of a typical chemical bond. “The model is actually, most of the time, doing better than that,” added Li. “It is only a starting point for future research, which will not only improve model accuracy but also extend its applicability to larger molecular systems.”
Expanding to larger molecules and chemical reactions
The paper is a major step in advancing Coulomb explosion imaging, which has long been limited by the challenge of reconstructing molecular structures from experimental measurements. In future work, the researchers plan to scale up the number of atoms the machine learning model can piece back together and apply the model to time-resolved experiments at the LCLS and European XFEL. That will help researchers to reconstruct snapshots of molecules in motion, creating flip-book-like molecular movies with insights into how chemical reactions unfold. It will also help with the interpretation of data collected at the high X-ray pulse rates delivered by SLAC’s superconducting X-ray laser, Cryan said.
The team is also now testing the model’s ability to reconstruct molecules from incomplete data. Much of the time, the detector misses an ion produced in the Coulomb explosion. Li wants to know, for example: Can the AI still reconstruct an ethanol molecule if one or more of its hydrogen ions are not registered in the detector?
If these challenges are resolved, the technique could become more applicable in biology and chemistry research. Proteins, for instance, can consist of thousands of atoms. “That’s really the goal,” said Li. “We will be able to study systems that are more biologically or industrially relevant.”
The team also included researchers from the Stanford PULSE Institute; Stanford University; Kansas State University; European XFEL, Germany; the Max Planck Institute for Nuclear Physics, Germany; Fritz Haber Institute, Germany; and Sorbonne University, France. Large parts of this work were funded by the Department of Energy’s Office of Science. LCLS is an Office of Science user facility.
Citation: X. Li et al., Nature Communications, 03 March 2026 (10.1038/s41467-026-70160-5)
For media inquiries, please contact media@slac.stanford.edu. For other questions or comments, contact SLAC Strategic Communications & External Affairs at communications@slac.stanford.edu.
About SLAC
SLAC National Accelerator Laboratory explores how the universe works at the biggest, smallest and fastest scales and invents powerful tools used by researchers around the globe. As world leaders in ultrafast science and bold explorers of the physics of the universe, we forge new ground in understanding our origins and building a healthier and more sustainable future. Our discovery and innovation help develop new materials and chemical processes and open unprecedented views of the cosmos and life’s most delicate machinery. Building on more than 60 years of visionary research, we help shape the future by advancing areas such as quantum technology, scientific computing and the development of next-generation accelerators.
SLAC is operated by Stanford University for the U.S. Department of Energy’s Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time.