How scientists are using supercomputers to combat COVID-19
Much work remains, but some of the Consortium’s most prominent members — among them Microsoft, Intel, and Nvidia — claim that progress is being made.
Petaflops of compute
Powerful computers allow researchers to undertake high volumes of calculations in epidemiology, bioinformatics, and molecular modeling, many of which would take months on traditional computing platforms (or years if done by hand). Moreover, because the computers are available in the cloud, they enable teams to collaborate from anywhere in the world.
Insights generated by the experiments can help advance our understanding of key aspects of COVID-19, such as viral-human interaction, viral structure and function, small molecule design, drug repurposing, and patient trajectory and outcomes. “Technology is a critical part of the COVID-19 research going on right now all over the world,” Dell Technologies VP Thierry Pellegrino told VentureBeat. (Dell Technologies is a member of the Consortium.) “It’s crucial to the population of our planet that researchers have the tools to understand, treat, and fight this virus. Researchers around the world are true heroes doing important work under extreme and unfamiliar circumstances, and we couldn’t be prouder to support their efforts.”
Companies and institutions matched 62 projects in the U.S., Germany, India, South Africa, Saudi Arabia, Croatia, Spain, the U.K., and other countries with supercomputers from Google Cloud, Amazon Web Services (AWS), Microsoft Azure, IBM, and dozens of academic and nonprofit research institutions for free. These are running on over 136,000 nodes containing 5 million processor cores and more than 50,000 graphics cards, which together deliver over 483 petaflops (430 trillion floating-point operations per second) of compute across hardware maintained by the Consortium’s 40 partners.
Microsoft
In addition to supercomputing infrastructure built atop its Azure cloud computing platform, Microsoft is providing researchers with networking and storage resources integrated with workload orchestration through Azure HPC. Concurrent with this is the company’s AI for Health program, which in April allocated $20 million to developments in five key areas — data and insights, treatment and diagnostics, allocation of resources, dissemination of accurate information, and scientific research — with the goal of bolstering work related to COVID-19.
As a part of its work with the Consortium, Microsoft says it’s providing teams access to its scientists spanning AI, HPC, quantum computing, and other areas of computer science at Microsoft Research and elsewhere. Much of these researchers’ work to date has entailed basic scientific discovery about COVID-19 itself and how it interacts with the human host, including the design of therapeutics, through:
- Research simulations.
- Molecular dynamics modeling.
- 3D mapping of virus protein structures.
- Compound screening to see if existing drug molecules are able to inhibit cellular entry of the virus.
Microsoft says each organization it collaborates with receives a full Azure HPC environment, including Azure CycleCloud with the Slurm workload manager, best-fit Azure Virtual Machines, and storage. These are configured to scale on-demand and meet compute as necessary, and they’re tailored to the specific research needs of the grantee.
Nepali modeling and ventilator splitting
Through the Consortium, Microsoft’s AI for Health is supporting the nonprofit research institute Nepal Applied Mathematics and Informatics Institute for Research (NAAMII), which is employing simulation to model how COVID-19 would spread among the Nepali population, given different scenarios. These models, Microsoft says, can show patterns that might potentially save lives and livelihoods.
Duke University, another grantee, is leveraging Azure to investigate ventilator splitting, a technique that enables multiple patients to use the same ventilator. The Matlab division of MathWorks teamed up with Microsoft to optimize the researchers’ analysis for distributed computing environments.
Google continues to provide compute, storage, and workload management services to Consortium grantees through Google Cloud Platform, and it recently set aside $20 million in computing credits for academic institutions and researchers studying COVID-19 treatments, therapies, and vaccines. As a part of its work with the Consortium, the company is collaborating on epidemiological modeling with Northeastern researchers and applying AI to medical imaging with the Complutense University of Madrid.
Google also partnered with the Harvard Global Health Institute to fund companies, government agencies, nonprofit organizations, and institutions working on COVID-19 research. The tech giant — along with Microsoft — also kicked off a program with Microsoft-backed cloud company Rescale to offer HPC resources at no cost to teams working to develop COVID-19 testing and vaccines. Rescale provides the platform that researchers launch experiments and record results on, while Google and Microsoft supply the backend computing resources.
Amazon
Amazon, like Google, is supplying compute and tools to researchers matched through the Consortium. Over 11 teams are currently using its infrastructure, and dedicated Amazon Web Services solution architects conference with the scientists every week.
As a part of its AWS Diagnostic Development Initiative, Amazon is also providing $20 million in computing credits to over 35 institutions and private companies that are leveraging AWS to further the development of COVID-19 point-of-care diagnostics — i.e., testing that can be done at home or at a clinic with same-day results. “This is a global health emergency that will only be resolved by governments, businesses, academia, and individuals working together to better understand this virus and ultimately find a cure,” said Teresa Carlson, VP of worldwide public sector at AWS, in a statement.
Developing protein decoys
At the MIT Media Lab, inspired by a researcher at Johns Hopkins University, a team is identifying “decoy” proteins of ACE2 receptors (the receptors coronaviruses bind to inside the human body) that might render COVID-19 inert. Using a machine learning model trained on data about the ACE2 receptor and running on AWS, the researchers are attempting to predict which variants of the decoy won’t interact with other proteins in the body and cause harmful side effects. If all goes well, tests in mice will commence soon, with clinical trials beginning toward the end of summer.
In separate efforts, AWS is empowering researchers at the Children’s National Hospital to combine hundreds of data sets to identify genes that might be targeted to treat COVID-19. A team at Iowa State University is tapping evolutionary models with public genomic data sets to suss out the relationships between strains of COVID-19 to understand how they mutate and spread. And scientists at Emory University are developing a web-based tool — tmCOVID — to extract and summarize key concepts in scientific studies on COVID-19.
Nvidia
Nvidia says that 14 of the Consortium’s projects have consumed over 3 million GPU hours on the Nvidia-powered Summit supercomputer at Oak Ridge National Laboratory. Summit is the world’s fastest supercomputer, as ranked by the Top 500 list of supercomputers. And it also offers its own 20,000-GPU infrastructure — SaturnV — which the company’s researchers are primarily using to optimize COVID-19 research applications
Nvidia has been using excess cycles on SaturnV to run Folding@home, a distributed computing project that simulates protein dynamics in an effort to help develop therapeutics for various diseases, including COVID-19. It has assisted in matching researchers to supercomputers based on each researcher’s specific requirements.
Quantum chemistry and virtual screening
In partnership with Microsoft, Nvidia is working with the University of California, Riverside on quantum chemistry solutions that benefit from GPU optimization. The number of possible COVID-19 inhibitors are immense, and carrying out experimental studies on all the candidates is both infeasible and cost-inefficient. The hope is that the project’s predictive, GPU-enabled simulations — which use up to 800,000 GPU hours on Azure — will provide guidance for efforts that narrow in on the most promising candidates.
According to Nvidia, in less than a week its experts helped project lead Bryan Wong’s package research code using HPC Container Maker, the company’s open source tool that ships with 30 containerized HPC applications. And they tapped Nvidia’s Nsight debugging tool to develop a fix for an onerous bug — making it possible to accomplish work scheduled to take 800,000 GPU-hours in 300,000 GPU-hours for a savings of $500,000.
At Carnegie Mellon University, a team led by Olexandr Isayev worked with Nvidia to apply AI approaches to the task of high-throughput virtual screening, which uses algorithms to identify bioactive molecules. Unlike traditional scientific simulations, which take a brute force approach to problems by attempting to simulate every possible combination of molecular interaction, AI makes educated guesses that reduce the number of combinations to be simulated. This leads to theoretically faster candidate drug discovery (and quicker field trials). Isayev estimates that it might be as much as a million times faster than usual mechanical calculations.
The first step in the process is using AI to analyze a library of molecules that can be purchased from chemical companies, preparing them for screening in simulation. The best candidates from the screening will then be simulated using AI-enhanced molecular dynamics, and top hits from the final screening will be tested in partner laboratories.
At the conclusion of their work, Isayev and colleagues plan to deposit their data sets in the open source COVID-19 data lake, a centralized repository of curated data sets maintained by Amazon’s AWS division in hopes that other researchers will benefit from them.
IBM
IBM VP of technical computing Dave Turek says COVID-19 research continues with partners across the spectrum — on machines powered by its hardware and within laboratories and institutions it has relationships with. “Without any large contracts or anything of the kind, [the Consortium] came together in a way to both share resources and manage a process of expediting the scientific proposals that came into the consortia and match it to the best resources,” he said in a statement. “The teams are making rapid progress, and these supercomputing-powered projects are using novel approaches to understanding the virus.”
For example, IBM researchers at the Hartree Centre in Daresbury, England partnered with University of Oxford scientists to combine molecular simulations with AI in discovering compounds that could be repurposed as anti-COVID-19 drugs. Using Summit and the Texas Advanced Computing Center’s (TACC) Frontera, the fifth-fastest system per the Top 500, the team says they’re accomplishing months of research in a matter of hours.
Generating molecular compounds
With the help of IBM, researchers at the University of Utah tapped the National Center for Supercomputing’s Blue Waters and TACC’s Longhorn and Frontera to generate more than 2,000 molecular models of compounds relevant for COVID-19. They ranked the models based on the molecules’ force field energy estimates, which they theorized could help scientists design better peptide inhibitors of an enzyme to stop COVID-19.
The team investigated the structure of the virus’ main protease, an enzyme that breaks down proteins and peptides, in complex with a peptide inhibitor called N3. They then applied an approach developed to identify Ebola-stopping molecules that involves molecular dynamic simulations and optimization of specific structures. This enabled the COVID-19 protease to break down a series of similar, easy-to-detect probes that had already been designed, serving as the basis for assessments that test the inhibitors’ effectiveness.
The work built on a body of knowledge about how the potential energy generated by atoms can give a molecule a positively or negatively charged “force field” that attracts or repels other molecules. Using AMBER, a molecular dynamics code, the researchers observed experimental results within one hundred-millionth of a centimeter, a measure imperceptible to all but the strongest microscopes.
The University of Utah’s Schmidt lab will later transform the peptide leads into biopharmaceutical scaffolds called circular modified peptides. “Our hope is that we find a new peptide inhibitor that can be experimentally verified in the next couple of weeks. And then we will engage in further design to make the peptide cyclic to make it more stable as a potential drug,” University of Utah professor and research lead Thomas Cheatham said in a statement.
Mapping how COVID-19 spreads
It’s well understood that COVID-19 spreads via virus-laden droplets, which are transported around environments by air conditioning units, wind, and other forms of turbulence. But airborne transmission rates remain a subject of contention, and some experts say gathering useful evidence of airborne transmission could take years and cost many lives.
In a safer pursuit of clarity, scientists at Utah State University, Lawrence Livermore National Lab, and the University of Illinois intend to use the Consortium’s supercomputing resources to study person-to-person transmission of airborne respiratory infectious diseases like COVID-19. They are working from the hypothesis that aerosolized droplets from human airways contaminate rooms more quickly than initially assumed. They’ll leverage high-fidelity multiphase large-eddy simulations (LES) — mathematical models for turbulence used in computational fluid dynamics — running on IBM hardware to determine cloud paths in typical hospital settings.
The short-term aim will be to understand how long a cloud persists and where the particles settle, which could inform non-pharmacological techniques to reduce the spread. “The [goal] of this study is to fundamentally improve our understanding of person-to-person transmission of airborne respiratory infectious diseases,” wrote the researchers in a statement. “Our findings will [make] it safer for health care professionals.”
Studying genetic susceptibility
Beyond isolating COVID-19-killing compounds and mapping viral spread of the virus, researchers are attempting to define risk groups by performing genome analysis and IBM supercomputer-enhanced DNA sequencing.
A team of scientists affiliated with NASA have observed that COVID-19 appears to cause pneumonia, triggering an inflammatory response in the lungs called acute respiratory distress (ARDS). To test this, they plan to use the supercomputer at NASA’s Ames Research Center, which will sequence the genome on patients who develop ARDS and those who don’t.
If all goes well, the team believes their study will result in practical tools for predicting which COVID-19 patients are likely to develop ARDS, and therefore which patients are likely to need intensive support prior to the emergence of severe symptoms. Such tools could help guide intensive care resource usage for the sickest patients, and enable health care workers to better manage ongoing treatment.
Intel
Intel is actively involved in the design, development, and deployment of several Consortium-affiliated supercomputers, as well as the upcoming Aurora at Argonne National Laboratory in Chicago. The company says it has a staff of engineers working on code optimizations for HPC applications that include LAMMPS (a molecular dynamics code), Gromacs (a package for protein, lipids, and nucleic acids simulation), NAMD (another molecular dynamics code), AMBER, and others. Intel is also sharing tools, architecture knowledge, and software with partners to enhance COVID-19 applications and scale their performance on Intel-based hardware.
One specific area of focus for Intel is a collaboration with NAMD to release a version of the code that provides faster simulations on Xeon processors that support AVX-512. The company says the significant performance boost will allow researchers to achieve longer timescales in the simulation of relevant molecules associated with COVID-19, by extension enabling them to better understand aspects of viral infection with “atom-level” detail. The update is expected to be made public for early use in June.
Hewlett Packard Enterprise
Some of Hewlett Packard Enterprise’s (HPE) work is done through the Consortium, while the rest is focused on a number of customers and partners. As a result of its acquisition of Cray in September 2019 for approximately $1.3 billion, HPE claims it now has more supercomputers and HPC systems in use by leading research centers.
“High-performance computing is more powerful today than it’s ever been, and its massive computing power — along with other advanced capabilities — has significantly transformed drug discovery,” said Peter Ungaro, former Cray CEO and head of HPE’s HPC and mission-critical systems group, in a statement. “Supercomputing and HPC systems unlock greater potential for AI and machine learning applications, and when applied to 3D modeling and simulations, dramatically [accelerate] time-to-insight and [increase] scientific outcomes. Our work within the consortium provides the researchers with HPC capabilities they wouldn’t normally have access to independently to help fast-track the discovery of a cure for the pandemic.”
Drug design research
In partnership with Microsoft, HPE is working with a team at the University of Alabama in Huntsville (UAH) to supply its Sentinel supercomputer through the Azure cloud. With the supercomputer, along with a team of dedicated HPE experts, it’s supporting various stages of the drug design process at UAH.
The researchers are employing a molecular docking approach, a kind of bioinformatic modeling that involves the interaction of two or more molecules to yield a stable combination. Drawing on a large, open set of natural products found in plants, animals, fungi, and the sea, Sentinel is performing calculations to determine how natural compounds interact with COVID-19’s protein. Previously, 20,000 molecular dockings could be improved against a protein target in seven or eight minutes, versus the full 24 hours it used to take. Now, the research team can perform as many as 1.2 million molecular dockings per day.
Elsewhere, HPE is supporting work at the Lawrence Livermore National Laboratory. The researchers’ goal is to apply AI to accelerate the process of simulating billions of molecules from a database of drug candidates. They’ve narrowed down the number of potential candidates from 1040 to a set of 20, and they’ve tapped Catalyst — an HPE-powered HPC cluster that generates predictions like experimental and structural biology data — to improve outcomes and speed up discovery.
HPE is also collaborating with France’s National Center for Scientific Research and GENCI to arm scientists at the Sorbonne University in Paris with GENCI’s Jean Zay supercomputer, which HPE designed. The team is using Jean Zay to optimize the Tinker-HP software, an approach to parallel computing enabled by multiple graphics cards and designed to simulate at the level of atoms for large biological molecules. Tinker-HP simultaneously performs a range of data-intensive calculations to create 3D simulations of molecular interactions faster and at high resolutions than would otherwise be possible.
Contributions from the private sector
The nature of the Consortium’s work isn’t strictly academic. Startups hope to use the group’s vast computational resources to develop treatments, molecular designs, and drugs targeting COVID-19.
Kolkata-based Novel Techsciences is identifying phytochemicals from the more than 3,000 medicinal plans and anti-viral plant extracts in India that might act as natural drugs against COVID-19. The team also plans to isolate plant-derived compounds that could help tackle multi-drug resistance that arises as the coronavirus mutates, with the goal of developing a comprehensive prophylactic treatment regime.
In London, Y Combinator-backed PostEr is overseeing the Moonshot Project, which aims to produce inhibitors based on over 60 fragment hits (i.e., molecules validated to bind to a target protein, making them a chemical starting point for drug discovery) that have been isolated in experiments to determine the molecular structure of COVID-19. By running machine learning algorithms in the background to triage suggestions and generate synthesis plans, PostEra has identified around 21 highly effective volunteer-submitted molecular designs, which will be synthesized by chemical company Enamine. The results of this project will be tested on animals in months.
If successful, PostEra’s would be one of the first drugs developed in an open source fashion. “[Machine learning] can reduce the time to determine optimal ways to make these compounds from weeks to days,” the company said in a statement. “[We believe] the worldwide scientific community [can suggest] drug candidates that might bind to, and neutralize, [COVID-19].”
Another private sector project is led by London-based AI startup Kuano. This team’s intention is to gain insights from diseases that are similar to COVID-19 — mainly other coronaviruses — to design an effective COVID-19 drug. This effort relies on a genetic algorithm that searches the chemical space surrounding existing antiviral drugs and a deep learning-based classification model built on existing binding data. The company is combining those tools with docking and molecular dynamics simulations to enhance the results and yield machine learning models that can be used to score molecular designs for synthesis as antiviral compounds.
As for AI and drug development startup Innoplexus, it’s also working with the Consortium’s supercomputers to accelerate the discovery of molecules that could lead to a drug to combat COVID-19. It expects to run permutations on five promising candidates — specifically, candidates that are potent, non-toxic, and can be manufactured.
Early days
Despite the fact that much of the work remains in the early stages, momentum around the Consortium appears to be accelerating.
Last month, IBM announced that UK Research and Innovation (UKRI) and the Swiss National Supercomputer Center (CSCS) will join the Consortium, making available machines that include the University of Edinburgh’s ARCHER; the Science and Technology Facilities Council’s DIRAC; the Biotechnology and Biological Sciences Research Council’s Earlham Institute; and Piz Daint, the sixth-ranked supercomputer in the world, according to the Top 500. The new additions have brought the total available petaflops up to 483 from 437 in May and 300 in mid-March.
“The COVID-19 HPC Consortium … is the largest public-private computing partnership ever created. What started as a series of phone calls […] five days later, more than two dozen partners came on board, many who are typically rivals,” said IBM’s Turek. “Without any large contracts or anything of the kind, this group [is coming] together in a way to both share resources and manage a process of expediting the scientific proposals that came into the consortia and match it to the best resources.”
Post a Comment