Researchers Freely Share LCLS Experiment Data on Public Database
In 2009, when biophysicist Ilme Schlichting and her colleagues applied to use the X-ray laser at SLAC’s Linac Coherent Light Source, they added a radical idea to their proposal: They would make all the data they collected on two viruses and a nanoparticle available to the public one year after the experiment ended.
Their proposal is part of a nascent movement to re-distribute the wealth of data provided by ultra-bright lasers such as LCLS. Most scientists guard their data closely, fearing that other researchers taking a fresh look at the information will pre-empt them by publishing results first. But in the field of X-ray imaging, some say the risk is worth taking because data sharing speeds scientific progress.
LCLS is currently the only X-ray laser worldwide that allow researchers to capture high-resolution single shot images of atoms and molecules. Only one-fifth of the groups who apply to use LCLS are granted experimental sessions. Those few lucky researchers have massive amounts of data, while many others struggle to collect any.
A single LCLS experiment lasting 12 hours can produce millions of images, only a small fraction of which may contain the information scientists seek. Devising methods for filtering and interpreting the data deluge is its own research problem. Scientists are working on new computational methods to keep pace with laser technologies that churn out ever-increasing amounts of information.
“We hope our open source approach to the diffraction data sets will expedite new solutions to this problem,” the researchers wrote in their proposal.
Two years after writing the proposal, data from their June 2010 experiment has been deposited in the Coherent X-ray Imaging Data Bank, created by Schlichting’s collaborator Filipe Maia, a postdoctoral fellow at Lawrence Berkeley National Laboratory.
The database, called CXIDB, opened in February with two images of Mimivirus made by an international collaboration, including Schlichting and Maia, headed by Janos Hajdu of Sweden’s Uppsala University. Today, the database contains 15 sets of data, including five from Schlichting’s group. The data consists of diffraction images formed when an X-ray beam hits atoms and molecules in the experimenter’s sample and scatters. The pattern of scattering depends on the shape, size, and orientation of the sample, giving researchers a visual read-out of the molecules.
So far, groups from LCLS, Berkeley, Germany, and Sweden have contributed data of samples such as viruses, yeast and nanoparticles.
No one outside of Schlichting's original experimental group has reported publishing papers based on the data the group posted. That day may come soon, though, as Maia and Schlichting have each fielded numerous questions from researchers downloading CXIDB data.
CXIDB follows the model of the Protein Data Bank and GenBank, which have helped structural biologists and geneticists share data for years. LCLS’s contribution to CXIDB reflects the continued spread of the open-access ethos to physics and other research communities—in some cases sparking vigorous debate.
Many researchers balk at the idea of sharing their data before they have published their own findings. But Schlichting, who is affiliated with the Max Planck Institute for Medical Research in Heidelberg and the Max-Planck Advanced Study Group at the Center for Free Electron Laser Science in Hamburg, has already shared some data that she is still analyzing.
“Maybe someone else will get to it first,” said Schlichting, “but that’s just a risk we take.” She said she believes those risks are outweighed by the benefits to collective scientific advancement: “There’s not enough beam time, and the field has to move forward.”
Helen Shen is a science-communications intern at SLAC National Accelerator Laboratory.