New Answers in Rare Disease with Long-Read Sequencing By Luke Hickey I t’s an exciting time for researchers focused on solving rare better equipped than ever to identify these diseases, deter-sue the development for new treatments. diseases. Thanks to new tools and methods, scientists are to pre-sequencing rates. But clearly, there are still too many people experiencing the diagnostic odyssey so common to rare disease patients. New innovations in DNA sequencing may offer a way to help. mine the biological mechanisms underlying them, and pur-Hunting pathogenic variants Next-generation sequencing (NGS) tools are characterized by two common traits: the massively parallel reactions that allow them to produce a huge amount of data, and the short-read data type they generate. The length of any individual read typically ranges from 50 base pairs to 350 stitched together in silico to yield a useful assembly. Short-to this research progress ments in DNA sequencing technology. As these plat-forms have become more rate, scientists have found Luke Hickey, Pacific Biossciences The biggest contributor has come from improve-affordable and more accu -it feasible to apply them base pairs; these reads are then mapped to a reference or read sequencers are very good at detecting single nucleotide Since a substantial number of rare diseases are caused by for studies of rare disease. variants (SNVs) or indels that are less than 10 bases long. SNVs (e.g., non-synonymous heterozygous variants occur-ring in exon coding region of genes), NGS platforms have been quite useful for solving these cases. Unfortunately, short-read sequencing technologies nome sequencing, once out of reach for all but the best-funded labs, are now inexpensive enough to be used routinely. Exome-or even whole-ge-dous strides in discovering disease-causing variants. More than 7,000 rare and Mendelian diseases have been identified, and many of them still have unknown biological causes. New diseases are identified every year. While genome-wide exploration has already shown promise, there is much more the countless others yet to be identified. With genome-wide data, researchers have made tremen-struggle to detect larger variants, which explains at least some of the gap between known rare diseases and those that have been solved with NGS data. Short reads con-taining non-unique sequence will map to many places in associated with repetitive regions, such as repeat expan-work to be done to provide answers for these diseases—and As clinical research teams have deployed sequencing the genome, leading to assembly errors. For rare diseases sion disorders like ALS and Fragile X, mapping ambigu-ity and expansion length prevent researchers from getting a clear view of the region of interest. In addition, some variants are so large they cannot be fully spanned by tools to understand rare diseases, they have dramatically increased the diagnostic yield. These solve rates now run between 25% and 50%, a significant improvement compared 38 Clinical OMICs July/August 2020 short reads alone. These structural variants, which have www.clinicalomics.com