Gesamtlänge aller Episoden: 3 days 1 hour 46 minutes
Hi-C is a sequencing-based assay that provides information about the 3-dimensional organization of the genome. In this episode, Simeon Carstens explains how he applied the Inferential Structure Determination (ISD) framework to build a 3D model of chromatin and fit that model to Hi-C data using Hamiltonian Monte Carlo and Gibbs sampling...
Long read sequencing technologies, such as Oxford Nanopore and PacBio, produce reads from thousands to a million base pairs in length, at the cost of the increased error rate. Trevor Pesout describes how he and his colleagues leverage long reads for simultaneous variant calling/genotyping and phasing. This is possible thanks to a clever use of a hidden Markov model, and two different algorithms based on this model are now implemented in the MarginPhase and WhatsHap tools...
This time you’ll hear from Fabio Cunial on the topic of Markov models and space-efficient data structures. First we recall what a Markov model is and why variable-order Markov models are an improvement over the standard, fixed-order models. Next we discuss the various data structures and indexes that allowed Fabio and his collaborators to represent these models in a very small space while still keeping the queries efficient...
In this episode, HoJoon Lee and Seung Woo Cho explain how to perform a CRISPR experiment and how to analyze its results. HoJoon and Seung Woo developed an algorithm that analyzes sequenced amplicons containing the CRISPR-induced double-strand break site and figures out what exactly happened there (e.g. a deletion, insertion, substitution etc...
Relief is a statistical method to perform feature selection. It could be used, for instance, to find genomic loci that correlate with a trait or genes whose expression correlate with a condition. Relief can also be made sensitive to interaction effects (known in genetics as epistasis).
In this episode, Trang Lê joins me to talk about Relief and her version of Relief called STIR (STatistical Inference Relief)...
Kaushik Panda and Keith Slotkin come on the podcast to educate us about repetitive DNA and transposable elements. We talk LINEs, SINEs, LTRs, and even Sleeping Beauty transposons! Kaushik and Keith explain why repeats matter for your whole-genome analysis and answer listeners’ questions...
Antoine Limasset joins me to talk about NGS read correction. Antoine and his colleagues built the read correction tool Bcool based on the de Bruijn graph, and it corrects reads far better than any of the current methods like Bloocoo, Musket, and Lighter.
We discuss why and when read correction is needed, how Bcool works, and why it performs better but slower than k-mer spectrum methods...
In this episode, I talk to Fernando Portela, a software engineer and amateur scientist who works on RNA design — the problem of composing an RNA sequence that has a specific secondary structure.
We talk about how Fernando and others compete and collaborate in designing RNA molecules in the online game EteRNA and about Fernando’s new RNA design algorithm, NEMO, which outperforms all prior published methods by a wide margin...
In this episode I’m joined by Chang Xu. Chang is a senior biostatistician at QIAGEN and an author of smCounter2, a low-frequency somatic variant caller. To distinguish rare somatic mutations from sequencing errors, smCounter2 relies on unique molecular identifiers, or UMIs, which help identify multiple reads resulting from the same physical DNA fragment...
Linear mixed models are used to analyze GWAS data and detect QTLs. Andrey Ziyatdinov recently released an R package, lme4qtl, that can be used to formulate and fit these models. In this episode, Andrey and I discuss linear mixed models, genome-wide association studies, and strengths and weaknesses of lme4qtl...