One of the questions we get most from VCs and young companies is how to leverage the existing public data space to augment internal capabilities. For example, sometimes we are asked to gain confidence in recent proof-of-concept work by "proving" that similar evidence exists in public spaces, or to mine for completely novel targets from relevant disease data.
In this instance, we were looking for so-called "poison exons"—a stretch of sequence which, if included in the mRNA transcript, caused degradation of both the mRNA and protein products through the nonsense-mediated decay (NMD) pathway. If we could find poison exons in gene targets we'd like to knock down, then we could build screening assays to find chemical matter that enhanced their inclusion and thereby hopefully provide therapeutic benefit.
Identifying these poison exons is not easy! Given the mRNA products themselves are degraded, it can be tricky to "catch" them in RNAseq experiments. They can be enriched by blocking degradation (inhibiting NMD factors or translation itself are two ways.) Without enrichment, poison exons are typically seen at very low inclusion rates; sometimes, just a few reads.
This might seem daunting if you're relying on only your own experiments to find them, however it is a great example of a problem where public data can be leveraged to your advantage. More than 400,000 bulk RNA sequencing datasets from human tissues and cell lines exist in the sequence read archive (SRA) alone, and we mined these and other public datasets to compile a rich catalog (~1.6M) of poison exons, most of which had never been reported before.
That posed a problem for assay development at Remix Therapeutics. We cannot screen for 1.6M targets in a cost-effective way (yet), so what is the best way to identify the best ones?
For this, we built a deep learning framework which was an unbiased predictor of poison exon inclusion under "perturbation" conditions, e.g., a small molecule, or an experiment designed to mimic small molecule perturbations by gluing the RNA processing machinery to the poison exon using a specially-designed transfected oligonucleotide.
These experiments provided us with a way to fine-tune our models: given the results of the small molecule or oligonucleotide experiment, update the model (or update the model training set) to include these results. The result was a model highly enriched for strong poison exon candidates, enabling fewer—and better—screening assays overall.
Citation
Dominic Reynolds, RNA-Targeted Drug Discovery Dec 2024