the bioinformatics chat

A podcast about computational biology, bioinformatics, and next generation sequencing.

https://bioinformatics.chat

Eine durchschnittliche Folge dieses Podcasts dauert 1h3m. Bisher sind 63 Folge(n) erschienen. .

Gesamtlänge aller Episoden: 2 days 20 hours 19 minutes

subscribe
share





episode 43: Generalized PCA for single-cell data with William Townes


Will Townes proposes a new, simpler way to analyze scRNA-seq data with unique molecular identifiers (UMIs). Observing that such data is not zero-inflated, Will has designed a PCA-like procedure inspired by generalized linear models (GLMs) that, unlike the standard PCA, takes into account statistical properties of the data and avoids spurious correlations (such as one or more of the top principal components being correlated with the number of non-zero gene counts)...


share







 2020-03-27  59m
 
 

episode 42: Spectrum-preserving string sets and simplitigs with Amatur Rahman and Karel Břinda


In this episode, we hear from Amatur Rahman and Karel Břinda, who independently of one another released preprints on the same concept, called simplitigs or spectrum-preserving string sets. Simplitigs offer a way to efficiently store and query large sets of k-mers—or, equivalently, large de Bruijn graphs...


share







 2020-02-28  53m
 
 

episode 41: Epidemic models with Kris Parag


Kris Parag is here to teach us about the mathematical modeling of infectious disease epidemics. We discuss the SIR model, the renewal models, and how insights from information theory can help us predict where an epidemic is going...


share







 2020-01-27  1h8m
 
 

episode 40: Plasmid classification and binning with Sergio Arredondo-Alonso and Anita Schürch


Does a given bacterial gene live on a plasmid or the chromosome? What other genes live on the same plasmid?

In this episode, we hear from Sergio Arredondo-Alonso and Anita Schürch, whose projects mlplasmids and gplas answer these types of questions.

Links:

  • mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species (Sergio Arredondo-Alonso, Malbert R. C. Rogers, Johanna C...


share







 2019-12-30  45m
 
 

episode 39: Amplicon sequence variants and bias with Benjamin Callahan


In this episode, Benjamin Callahan talks about some of the issues faced by microbiologists when conducting amplicon sequencing and metagenomic studies. The two main themes are:

  • Why one should probably avoid using OTUs (operational taxonomic units) and use exact sequence variants (also called amplicon sequence variants, or ASVs), and how DADA2 manages to deduce the exact sequences present in the sample...


share







 2019-11-29  1h1m
 
 

episode 38: Issues in legacy genomes with Luke Anderson-Trocmé


In this episode, Luke Anderson-Trocmé talks about his findings from the 1000 Genomes Project. Namely, the early sequenced genomes sometimes contain specific mutational signatures that haven’t been replicated from other sources and can be found via their association with lower base quality scores. Listen to Luke telling the story of how he stumbled upon and investigated these fake variants and what their impact is...


share







 2019-10-22  1h1m
 
 

episode 37: Causality and potential outcomes with Irineo Cabreros


In this episode, I talk with Irineo Cabreros about causality. We discuss why causality matters, what does and does not imply causality, and two different mathematical formalizations of causality: potential outcomes and directed acyclic graphs (DAGs). Causal models are usually considered external to and separate from statistical models, whereas Irineo’s new paper shows how causality can be viewed as a relationship between particularly chosen random variables (potential outcomes)...


share







 2019-09-27  40m
 
 

episode 36: scVI with Romain Lopez and Gabriel Misrachi


In this episode, we hear from Romain Lopez and Gabriel Misrachi about scVI—Single-cell Variational Inference. scVI is a probabilistic model for single-cell gene expression data that combines a hierarchical Bayesian model with deep neural networks encoding the conditional distributions. scVI scales to over one million cells and can be used for scRNA-seq normalization and batch effect removal, dimensionality reduction, visualization, and differential expression...


share







 2019-08-30  1h20m
 
 

episode 35: The role of the DNA shape in transcription factor binding with Hassan Samee


Even though the double-stranded DNA has the famous regular helical shape, there are small variations in the geometry of the helix depending on what exact nucleotides its made of at that position.

In this episode of the bioinformatics chat, Hassan Samee talks about the role the DNA shape plays in recognition of the DNA by DNA-binding proteins, such as transcription factors...


share







 2019-07-26  1h1m
 
 

episode 34: Power laws and T-cell receptors with Kristina Grigaityte


An αβ T-cell receptor is composed of two highly variable protein chains, the α chain and the β chain. However, based only on bulk DNA or RNA sequencing it is impossible to determine which of the α chain and β chain sequences were paired in the same receptor.

In this episode, Kristina Grigaityte talks about her analysis of 200,000 paired αβ sequences, which have been obtained by targeted single-cell RNA sequencing...


share







 2019-06-29  1h26m