Skip to main content
Soft Blue homeNews home
Story
5 of 50

Helping scientists run complex data analyses without writing code

Co-founded by an MIT alumnus, Watershed Bio offers researchers who aren’t software engineers a way to run large-scale analyses to accelerate biology.

As costs for diagnostic and sequencing technologies have plummeted in recent years, researchers have collected an unprecedented amount of data around disease and biology. Unfortunately, scientists hoping to go from data to new cures often require help from someone with experience in software engineering.

Now, Watershed Bio is helping scientists and bioinformaticians run experiments and get insights with a platform that lets users analyze complex datasets regardless of their computational skills. The cloud-based platform provides workflow templates and a customizable interface to help users explore and share data of all types, including whole-genome sequencing, transcriptomics, proteomics, metabolomics, high-content imaging, protein folding, and more.

“Scientists want to learn about the software and data science parts of the field, but they don’t want to become software engineers writing code just to understand their data,” co-founder and CEO Jonathan Wang ’13, SM ’15 says. “With Watershed, they don’t have to.”

Watershed is being used by large and small research teams across industry and academia to drive discovery and decision-making. When new advanced analytic techniques are described in scientific journals, they can be added to Watershed’s platform immediately as templates, making cutting-edge tools more accessible and collaborative for researchers of all backgrounds.

“The data in biology is growing exponentially, and the sequencing technologies generating this data are only getting better and cheaper,” Wang says. “Coming from MIT, this issue was right in my wheelhouse: It’s a tough technical problem. It’s also a meaningful problem because these people are working to treat diseases. They know all this data has value, but they struggle to use it. We want to help them unlock more insights faster.”

No code discovery

Wang expected to major in biology at MIT, but he quickly got excited by the possibilities of building solutions that scaled to millions of people with computer science. He ended up earning both his bachelor’s and master’s degrees from the Department of Electrical Engineering and Computer Science (EECS). Wang also interned at a biology lab at MIT, where he was surprised how slow and labor-intensive experiments were.

“I saw the difference between biology and computer science, where you had these dynamic environments [in computer science] that let you get feedback immediately,” Wang says. “Even as a single person writing code, you have so much at your fingertips to play with.”

While working on machine learning and high-performance computing at MIT, Wang also co-founded a high frequency trading firm with some classmates. His team hired researchers with PhD backgrounds in areas like math and physics to develop new trading strategies, but they quickly saw a bottleneck in their process.

“Things were moving slowly because the researchers were used to building prototypes,” Wang says. “These were small approximations of models they could run locally on their machines. To put those approaches into production, they needed engineers to make them work in a high-throughput way on a computing cluster. But the engineers didn’t understand the nature of the research, so there was a lot of back and forth. It meant ideas you thought could have been implemented in a day took weeks.”

To solve the problem, Wang’s team developed a software layer that made building production-ready models as easy as building prototypes on a laptop. Then, a few years after graduating MIT, Wang noticed technologies like DNA sequencing had become cheap and ubiquitous.

“The bottleneck wasn’t sequencing anymore, so people said, ‘Let’s sequence everything,’” Wang recalls. “The limiting factor became computation. People didn’t know what to do with all the data being generated. Biologists were waiting for data scientists and bioinformaticians to help them, but those people didn’t always understand the biology at a deep enough level.”

The situation looked familiar to Wang.

“It was exactly like what we saw in finance, where researchers were trying to work with engineers, but the engineers never fully understood, and you had all this inefficiency with people waiting on the engineers,” Wang says. “Meanwhile, I learned the biologists are hungry to run these experiments, but there is such a big gap they felt they had to become a software engineer or just focus on the science.”

Wang officially founded Watershed in 2019 with physician Mark Kalinich ’13, a former classmate at MIT who is no longer involved in day-to-day operations of the company.

Wang has since heard from biotech and pharmaceutical executives about the growing complexity of biology research. Unlocking new insights increasingly involves analyzing data from entire genomes, population studies, RNA sequencing, mass spectrometry, and more. Developing personalized treatments or selecting patient populations for a clinical study can also require huge datasets, and there are new ways to analyze data being published in scientific journals all the time.

Today, companies can run large-scale analyses on Watershed without having to set up their own servers or cloud computing accounts. Researchers can use ready-made templates that work with all the most common data types to accelerate their work. Popular AI-based tools like AlphaFold and Geneformer are also available, and Watershed’s platform makes sharing workflows and digging deeper into results easy.

“The platform hits a sweet spot of usability and customizability for people of all backgrounds,” Wang says. “No science is ever truly the same. I avoid the word product because that implies you deploy something and then you just run it at scale forever. Research isn’t like that. Research is about coming up with an idea, testing it, and using the outcome to come up with another idea. The faster you can design, implement, and execute experiments, the faster you can move on to the next one.”

Accelerating biology

Wang believes Watershed is helping biologists keep up with the latest advances in biology and accelerating scientific discovery in the process.

“If you can help scientists unlock insights not a little bit faster, but 10 or 20 times faster, it can really make a difference,” Wang says.

Watershed is being used by researchers in academia and in companies of all sizes. Executives at biotech and pharmaceutical companies also use Watershed to make decisions about new experiments and drug candidates.

“We’ve seen success in all those areas, and the common thread is people understanding research but not being an expert in computer science or software engineering,” Wang says. “It’s exciting to see this industry develop. For me, it’s great being from MIT and now to be back in Kendall Square where Watershed is based. This is where so much of the cutting-edge progress is happening. We’re trying to do our part to enable the future of biology.”