If you would like to lend your spare compute power to the fight against COVID-19, visit the Folding@home website to find out more.
Ever since Folding@home—a massively distributed computing project set up to investigate the origins of diseases including cancer, Ebola, and Alzheimer's—turned its attention to COVID-19 in February of last year, more than 200,000 people worldwide have been donating their spare compute capacity to an urgent cause.
By downloading the Folding@home software and letting it work in the background on their computer, anyone, anywhere, can help researchers run simulations of the behavior of proteins implicated in COVID-19.
The result of this global collaboration is the world's first "exascale" distributed computing resource. A supercomputer capable of making a quintillion calculations a second, it is generating datasets of unprecedented size, which Folding@home is making available to researchers everywhere through the Registry of Open Data on Amazon Web Services.
"Our simulations are essentially lots of images of what a protein looks like," said Greg Bowman, director of Folding@home. "If we only had 10 images, we could look at them manually, compare them, and make inferences. However, we have hundreds of millions of images. The sheer magnitude of the data and our finite bandwidth as human beings means anything we do needs to be automated."
"It's why we need such a huge amount of compute power, and it's why we're making the data available in the AWS cloud. We need to get as many brains as possible working on it in parallel in order to reap the rewards."
"We are actively using this data for drug discovery," Bowman continued. "We have compounds in testing and hope to be in phase-one clinical trials in the next six months. We started this project to answer some very basic research questions, and the response so far has completely exceeded our expectations."
GIF showing how massive datasets are being used to combat COVID-19 with data
Folding@home is just one of a number of datasets made available in the cloud via the AWS Open Data Sponsorship Program, which covers the cost of storage for datasets of high value to the scientific community. The program aims to democratize access to data by making it available for analysis on AWS; to develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and to encourage the development of communities that benefit from access to shared datasets.
----------------------------------------------------------------
What is a massively distributed computing project? In massively distributed computing projects, individuals can lend spare compute power to a specific initiative. This is usually to help generate an enormous number of calculations or simulations required to solve a complex problem. Folding@home, for example, uses computer simulations to understand the movements of proteins implicated in a variety of diseases, including COVID-19.
What is an exascale computer? Today's fastest computers are capable of making a quadrillion—or a million billion—calculations a second. To put that into perspective, there are quadrillions of insects on earth. They are estimated to outnumber human beings (all seven or so billion of us) by a factor of 200 million to one. But even this kind of compute power shrinks in comparison with "exascale" supercomputers, which can make a quintillion, or 1,000,000,000,000,000,000 calculations a second. That's a lot of bugs.
How much data is Folding@home generating? To date, Folding@home has produced over 100,000 times more data on the COVID-19 than is typically created for other simulation studies. Before Folding@home made this data available on the AWS Open Data Registry, sharing it was incredibly time-consuming and cumbersome, as the file sizes are so large. Sometimes this meant mailing physical hard drives in order to share a data set. By making it available in the cloud, any interested parties can download the information quickly, on demand.