BaBar Data Preserved in 'Computational Cocoon' for Future Analysis
More than eight years worth of pristine particle physics data will remain available for analysis or re-analysis at least until 2018, now that BaBar's Long Term Data Access project is complete. The project preserves a complete set of BaBar data – all 530-plus inverse femtobarns of it – by, in a sense, stopping time for it, embedding it in a computational cocoon safe from upgrades, bug fixes and patches. Anything that could disrupt the computing environment where the data is housed is carefully kept out by a clever arrangement of servers, software and networked virtual machines.
The BaBar LTDA project is part of a growing movement to preserve hard-won data for future generations, rather than allowing it to fade into obscurity, trapped on storage media that have become obsolete. BaBar took data from electron-positron collisions in SLAC's PEP II collider from 1999 to 2008 while looking for hints as to why matter and not antimatter fills the universe instead.
Not only are BaBar researchers busy exploring the subtleties of their original questions, but they’re also using BaBar data to shed light on other puzzles that can and do come up. Now those data are ready.
"We are the first data-preservation effort to reach such a mature state," despite the numerous criteria the LTDA project had to meet, said Tina Cartaro, BaBar's computing coordinator. The data had to be easily managed and maintainable with a minimum of effort, she added.
At the same time, BaBar researchers wanted to capture the data’s full potential. "We wanted to be able to do everything from scratch – from the bottom up," she said, such as re-running all the analyses that have contributed to BaBar's almost 500 (and counting) published papers.
The data must also be available to check the results of other experiments or help address new questions that arise, such as whether some dark matter particles are much lighter than previously thought and can interact with normal photons.
The solution? Build for the data a frozen world on virtual machines – software facsimiles of actual computers that use their own operating systems to run their own programs, but exist within the computing environment of a physical machine. The LTDA computer architecture is designed to keep the BaBar virtual world safe and unchanging, isolated not only from the SLAC network, but from the rest of the World Wide Web.
BaBar collaboration members can access the data in the virtual machines to run new analyses and then retrieve their results to save to a different location, but can leave nothing new behind on the virtual machines. By the same token, "No access to file systems or accounts outside the LTDA system can be made from the virtual machines," said BaBar Emeritus Computing Coordinator Homer Neal.
As a result, the rest of the SLAC computing infrastructure is protected from any problems that might arise on the virtual machines, he said: "LTDA can't hurt anything."
- "BaBar Collaboration Completes Data Reprocessing," SLAC Today, Dec. 2, 2008
- "Preserving the Data Harvest," symmetry magazine, December 2009
- "BaBar Data Archive Prototype Arrives," SLAC Today, Dec. 2, 2010
- ICFA Study Group on Data Preservation and Long Term Analysis in High Energy Physics