Machine Learning to Understand and Prevent Disease

SAN FRANCISCO, Sept. 16, 2021 /PRNewswire/ — An unimaginable amount of data is continually being generated by scientific experiments, longitudinal…

SAN FRANCISCO, Sept. 16, 2021 /PRNewswire/ — An unimaginable amount of data is continually being generated by scientific experiments, longitudinal studies, clinical trials, and hospital records—but what can be done with all this information?

Barbara Engelhardt (she/her), PhD, is building machine-learning models and statistical tools to make use of that data and find ways to better understand, and even prevent, disease. She is now joining Gladstone Institutes as a senior investigator.

«Barbara is an innovator in computational biology,» says Katie Pollard, PhD, director of the Gladstone Institute of Data Science and Biotechnology. «She brings vast expertise in statistical models and will help expand our machine-learning program. And she’s a renowned graduate student mentor. We’re thrilled she’s joining our team.»

Engelhardt is also a full professor at Princeton University, on leave this academic year. She completed her graduate studies in computer science at UC Berkeley, and was a postdoctoral researcher at the University of Chicago before starting her first lab at Duke University.

«Since I first learned about Gladstone during my postdoc, it’s always seemed like an oasis of amazing science,» says Engelhardt. «I can’t wait to start collaborating with all the scientists here.»

Finding Value in Throwaway Data

Engelhardt’s lab is not how you might picture a traditional science lab—one with cells, glass beakers, and microscopes. Instead, she runs what’s called a dry lab, where her team uses powerful computers to analyze data through mathematical and computational approaches.

One of the group’s focus areas is to understand how cells work together in the body. The researchers look at how cells pass information to one another, how they work as part of neighborhoods, and how those neighborhoods are structured. Ultimately, they are trying to understand exactly how changes within cells or their environment can lead to disease.

To do so, they work closely with biologists, geneticists, and bioengineers to obtain data from their scientific experiments, such as microscopy images and videos of cells interacting over time. Using these files, Engelhardt can examine, for instance, whether treating cells with drugs affects how cells communicate, or how to target cancer tumors as directly as possible with a therapy.

«Sometimes, we ask for throwaway data, or data that the scientists don’t need for their studies, but from which we can still glean lots of valuable insights,» explains Engelhardt. «Other times, we collaborate more closely with a team to help them build better techniques to improve their own experiments.»

In those cases, the process is iterative. Engelhardt’s team will propose new approaches, the collaborators will try them and report back, and they’ll continue to work together to find the method that can generate the best results.

Understanding How Your Cells Record Trauma

Engelhardt also studies how traumatic events that occur in your life are stored in your cells, how they may affect your genome, and how this can eventually lead to disease.

«You essentially store traumatic events in your cells, like a battery,» she says. «And then later in life, these traumas may lead to depression, type 2 diabetes, obesity, heart disease, or mental health problems.»

Her team has been working with the Fragile Families and Child Wellbeing Study, for which nearly 5,000 unmarried mothers were recruited between 1998 and 2000—a sample that includes a large number of Black, Hispanic, and low-income families. Data has been collected over the past 22 years about these children, their mothers, and, when possible, their fathers.

«Unfortunately, though perhaps not surprisingly, these kids have been through a lot,» says Engelhardt. «A large number of them have incarcerated fathers, they’ve witnessed or been involved in crime, they’ve experienced bullying at school, they’ve gone to bed hungry, and they’ve been evicted from their homes.»

The instability in their lives has been recorded in their cells and shows up in chemical changes to their DNA, which was collected as part of the study. Engelhardt is using all the data available about these families to understand how traumatic events get stored in their cells, in order to find a way to erase the records and prevent disease outcomes.

«It’s challenging to work with data from a group of individuals from such diverse backgrounds, but it’s absolutely critical, and it’s pretty exciting that we get to do it,» she says.

Predicting the Best Course of Action for Patients

A third strand of research for Engelhardt’s lab is to build reinforcement learning methods. This is the approach often used to guide a robot wandering through a maze, or to inform «decisions» made by self-driving cars. But Engelhardt is applying this framework to electronic health care record data.

Reinforcement learning involves three categories of information. The first is a set of states. In the context of a hospital patient, the state may include the patient’s age, gender, heart rate, temperature, and diagnosed disease. The second category is a set of actions, which, in this case, would be the types of interventions that health care professionals might perform, such as putting a patient on a ventilator or giving them a particular drug. Finally, there’s a reward function, or the objectives of the patient’s care. This could be ensuring that vital signs are stable, reducing a patient’s temperature, taking them off the ventilator, or getting them discharged as soon as possible.

«Given those three things—the state, action, and reward—our goal is essentially to design a protocol that will lead to the best rewards,» Engelhardt says. «So by building a model that can analyze all that data, we want to predict a set of actions for a patient’s given state that will lead to the best outcome for their health.»

Applying reinforcement learning to patient data is much more complicated than using it for robotics or self-driving cars.

«With self-driving cars, we understand the state dynamics,» she explains. «Basically, we know exactly what will happen if we turn the wheel in a certain direction. But with patients, if we give them a certain drug, we don’t know precisely how their state will change as a result. So, my team is finding ways we can still predict the best intervention, despite this uncertainty.»

Engelhardt’s group is currently collaborating with two large hospitals that have provided anonymized electronic health care record data from nearly 400,000 patients. These data include 7,000 patients who have tested positive for COVID-19 in the past year.

«Half of these patients are Black, so we’re specifically building models to understand the differences in how doctors treat Black and White patients and how this may lead to different outcomes,» says Engelhardt.

To do so, her team is looking at the resources spent on patients, for instance, and if this correlates to whether they die or are discharged from the hospital.

«I think we can learn overall lessons from this data too, particularly about how the hospital system can best tackle an emerging disease like COVID-19,» she says. «We’re hoping to build tools that will help doctors respond as quickly as possible to new diseases like COVID-19 in the future.»

From Staring at Data to Helping Patients

Although Engelhardt grew up in New York City, moving to San Francisco to run her Gladstone lab feels almost like returning home, she says. In fact, she spent 14 years in California between pursuing her studies at Stanford University and UC Berkeley, and working at Jet Propulsion Laboratory and 23andMe.

As a child, math came easy to Engelhardt. But in school, it all seemed too theoretical to her, so she didn’t really enjoy statistics.

«I only started getting excited about statistics when I started my PhD—that’s when the awakening happened,» says Engelhardt. «I started collaborating with biology labs and everything just clicked. That’s when I finally understood what it meant to apply statistical models to data to come up with impactful results. It was like everything I had done up to that point was preparing me for this career path.»

Now, Engelhardt says she loves staring at data.

«Data is a big mystery and it’s so much fun to go in and play detective using computational models and methods,» she says. «And I love the collaborative aspect of science, too. I have an amazing group of students and postdocs, and it’s just incredible to play off their ideas.»

Engelhardt is also looking forward to building new collaborations at Gladstone, not only with investigators, but with nearby universities and hospitals as well.

«There are amazing wet labs here doing the most cutting-edge development of methods and technologies, a direct connection with hospitals and access to patient data, and really exciting applications to disease—all of that is the dream,» she says.

Ultimately, her goal is to leverage the freedom she has in academic research to increase the biomedical understanding of disease and improve patient care.

«I see Gladstone as a place that can catalyze the impact of the more theoretical work from my group, so that my research can actually help real patients,» says Engelhardt.

About Barbara Engelhardt

Barbara Engelhardt, PhD, is a senior investigator at Gladstone Institutes. She is a professor of computer science at Princeton University, on leave in 2021–2022, a position she took in 2014 after having been an assistant professor in biostatistics and bioinformatics and statistical sciences at Duke University for 3 years.

She graduated from Stanford University, received her PhD in electrical engineering and computer science from UC Berkeley, supported by an NSF Graduate Research Fellowship, and trained as a postdoctoral researcher at the University of Chicago. Engelhardt also spent 2 years working at Jet Propulsion Laboratory, a summer at Google Research, and a year at 23andMe.

Engelhardt received the 2021 Overton Prize from the International Society for Computational Biology, one of the top awards in this field. Her research interests involve developing statistical models and methods for the analysis of high-dimensional biomedical data, with a goal of understanding the underlying biological mechanisms of complex phenotypes and human disease.

About Gladstone Institutes

To ensure our work does the greatest good, Gladstone Institutes focuses on conditions with profound medical, economic, and social impact—unsolved diseases. Gladstone is an independent, nonprofit life science research organization that uses visionary science and technology to overcome disease. It has an academic affiliation with the University of California, San Francisco.

Media Contact: Julie Langelier | Associate Director, Communications | julie.langelier@gladstone.org | 415.734.5000

Cision View original content to download multimedia:https://www.prnewswire.com/news-releases/machine-learning-to-understand-and-prevent-disease-301378931.html

SOURCE Gladstone Institutes