2019 Fellow @ Cures Within Reach For Cancer
Studying biomedical informatics wasn’t a part of the plan. In 2015, I was in the middle of exploring ideas for a start-up. Friends and I were brainstorming, and I even considered moving to Silicon Valley. Then, in October 2015, my wife was diagnosed with breast cancer. We were too young and healthy to be handed such news. The diagnosis stopped us in our tracks.
I learned quickly that, unfortunately, nearly 70,000 young adults are diagnosed with cancer in the US and breast cancer is the most common cancer for women in this group. Trying to arm myself with information to support my wife and her treatment, I dug into research and literature. When, almost a year later, my wife completed treatment and was given a clean bill of health, I decided that tackling the disease of cancer was where I wanted to head professionally.
In particular, the experience made me want to understand the complexity of the disease and get to the heart of the promise and hype surrounding precision medicine. I reasoned that I had the analytical foundations to probe the issue using data science; I had trained as an electrical engineer and spent several years solving problems as a management consultant. With that conviction, I enrolled in a graduate program for Biomedical Informatics at Harvard Medical School in Fall 2018.
The backstory is important because it presents three relevant parameters that informed my decision to apply to the Harvard Data Science Initiative (HDSI) Public Service Data Science Graduate Fellowship. First, I wanted to apply data science to impact the lives of cancer patients; second, I wanted to help an organization that was constrained by resources; third, I wanted to bridge my past experiences and recently acquired skills. I met Laura Kleiman, the founder of Cures Within Reach for Cancer (CWR4C), at the 2019 Zelen Symposium: Data Science in Biomedical Research at Dana-Farber. I was presenting a poster for a machine learning approach to discover drug repurposing candidates for breast cancer when Laura came by. Excitedly, Laura spoke with me about how CWR4C was using AI to identify the most promising non-cancer generic drugs to repurpose for cancer treatment.
The overlap in our interests and passion to advance cancer research was immediately evident. I was intrigued and considered interning with CWR4C during the summer. However, as an early-stage non-profit, CWR4C did not have the resources to support my work. For this, I was grateful to receive the HDSI fellowship, which presented the flexibility to work with any non-profit, applying data science to solve social challenges, during the summer. I jumped on board.
CWR4C had recently bootstrapped its operations. In addition to the founder and a roster of seasoned advisors and volunteers, we were a team of ten interns from MIT, Northeastern, Tufts, Boston University, and Yale with backgrounds in biology, computer science, data science, statistics, health policy, and business. Huddled around a large table at WeWork’s co-working space in Central Square, we organized ourselves around three broad program areas: computational synthesis of anti-cancer evidence from scientific literature and electronic health records, exploration of innovative policies to implement clinical trials and change standard of care, and development of business models to sustain the organization.
One of my primary projects was to develop an evidence synthesis framework for drug repurposing. Each year 17 million people are diagnosed with cancer and nearly $1 trillion is spent on cancer care. Hundreds of FDA approved non-cancer generic drugs have shown promise for treating cancer in preclinical or small-scale clinical studies. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of these drugs represents an opportunity to rapidly improve patient outcomes and reduce healthcare costs. But, how can we demonstrate evidence of anti-cancer activity without going back to the lab? Can we systematize the laborious review process to infer evidence at scale? To answer these questions, we set out to develop an automated computational pipeline for evidence synthesis and drug prioritization.
Working with our collaborators at IBM Research’s Science for Social Good Initiative and Northeastern University, we developed machine learning algorithms and applied natural language processing (NLP) methods trained on biomedical literature to identify scientific studies describing anti-cancer activity for any drug of interest. This is a challenging problem that involves searching for contextually specific information among large volumes of text, applying pattern-recognition techniques to identify relevant phrases that signal evidence, and inferring levels of therapeutic evidence contained in the identified signal. For example, a paper might peripherally mention a drug, such as aspirin, without the drug actually being used to treat cancer in the study. This is just one example of noise that needed to be computationally identified and isolated.
The filtering criteria and models continue to evolve and improve the precision and accuracy of inferred evidence. Nonetheless, the framework for an automated computational pipeline has established a foundation that CWR4C can continue to build on. In the meantime, we’ve submitted our work as papers to two AI conferences, with the aim of engaging the AI community to expand on the promise of machine learning for synthesizing drug repurposing evidence. Thanks to the HDSI fellowship and CWR4C, I got to spend my summer advancing AI while advancing social good!