the bioinformatics chat

A podcast about computational biology, bioinformatics, and next generation sequencing.

Eine durchschnittliche Folge dieses Podcasts dauert 1h5m. Bisher sind 35 Folge(n) erschienen. Alle 4 Wochen erscheint eine Folge dieses Podcasts


#35 The role of the DNA shape in transcription factor binding with Hassan Samee

Even though the double-stranded DNA has the famous regular helical shape, there are small variations in the geometry of the helix depending on what exact nucleotides its made of at that position. In this episode of the bioinformatics chat, Hassan Samee talks about the role the DNA shape plays in recognition of the DNA by DNA-binding proteins, such as transcription factors. Hassan also explains how his algorithm, ShapeMF, can deduce the DNA shape motifs from the ChIP-seq data...



#34 Power laws and T cell receptors with Kristina Grigaityte

An αβ T-cell receptor is composed of two highly variable protein chains, the α chain and the β chain. However, based only on bulk DNA or RNA sequencing it is impossible to determine which of the α chain and β chain sequences were paired in the same receptor. In this episode Kristina Grigaityte talks about her analysis of 200,000 paired αβ sequences, which have been obtained by targeted single-cell RNA sequencing...


 2019-06-29  1h26m

#33 Genome assembly from long reads and Flye with Mikhail Kolmogorov

Modern genome assembly projects are often based on long reads in an attempt to bridge longer repeats. However, due to the higher error rate of the current long read sequencers, assemblers based on de Bruijn graphs do not work well in this setting, and the approaches that do work are slower...


 2019-05-31  1h12m

#32 Deep tensor factorization and a pitfall for machine learning methods with Jacob Schreiber

In this episode we hear from Jacob Schreiber about his algorithm, Avocado. Avocado uses a neural netwok to factorize a three-dimensional tensor of epigenomic data into the three independent factors corresponding to cell types, assay types, and genomic loci. Avocado can extract a low-dimensional, information-rich summary from the wealth of experimental data from projects like the Roadmap Epigenomics Consortium and ENCODE...


 2019-04-29  1h15m

#31 Bioinformatics Contest 2019 with Alexey Sergushichev and Gennady Korotkevich

The third Bioinformatics Contest took place in February 2019. Alexey Sergushichev, one of the organizers of the contest, and Gennady Korotkevich, the 1st prize winner, join me to discuss this year’s problems.


 2019-03-24  1h46m

#30 Bayesian inference of chromatin structure from Hi-C data with Simeon Carstens

Hi-C is a sequencing-based assay that provides information about the 3-dimensional organization of the genome. In this episode Simeon Carstens explains how he applied the Inferential Structure Determination (ISD) framework to build a 3D model of chromatin and fit that model to Hi-C data using Hamiltonian Monte Carlo and Gibbs sampling.


 2019-02-27  1h5m

#29 Haplotype-aware genotyping from long reads with Trevor Pesout

Long read sequencing technologies, such as Oxford Nanopore and PacBio, produce reads from thousands to a million base pairs in length, at the cost of the increased error rate. Trevor Pesout describes how he and his colleagues leverage long reads for simultaneous variant calling/genotyping and phasing. This is possible thanks to a clever use of a hidden Markov model, and two different algorithms based on this model are now implemented in the MarginPhase and WhatsHap tools.


 2019-01-27  1h12m

#28 Space-efficient variable-order Markov models with Fabio Cunial

This time you’ll hear from Fabio Cunial on the topic of Markov models and space-efficient data structures. First we recall what a Markov model is and why variable-order Markov models are an improvement over the standard, fixed-order models. Next we discuss the various data structures and indexes that allowed Fabio and his collaborators to represent these models in a very small space while still keeping the queries efficient...


 2018-12-28  1h9m

#27 Classification of CRISPR-induced mutations and CRISPRpic with HoJoon Lee and Seung Woo Cho

In this episode HoJoon Lee and Seung Woo Cho explain how to perform a CRISPR experiment and how to analyze its results. HoJoon and Seung Woo developed an algorithm that analyzes sequenced amplicons containing the CRISPR-induced double-strand break site and figures out what exactly happened there (e.g. a deletion, insertion, substitution etc.)


 2018-11-29  56m

#26 Feature selection, Relief and STIR with Trang Lê

Relief is a statistical method to perform feature selection. It could be used, for instance, to find genomic loci that correlate with a trait or genes whose expression correlate with a condition. Relief can also be made sensitive to interaction effects (known in genetics as epistasis). In this episode Trang Lê joins me to talk about Relief and her version of Relief called STIR (STatistical Inference Relief)...


 2018-10-27  1h8m