the bioinformatics chat

A podcast about computational biology, bioinformatics, and next generation sequencing.

https://bioinformatics.chat

subscribe
share






episode 8: Perfect k-mer hashing in Sailfish


The original version of Sailfish, an RNA-Seq quantification tool, used minimal perfect hash functions to replace k-mers with unique integers. (The current version appears to be using a Cuckoo hashmap instead.)

This is my attempt to explain how a minimal perfect hash function could be built. The algorithm described here is not exactly the same as the one Sailfish used, but it follows the same idea.

Sections:

  • Sailfish and perfect hashing (1:15)
  • Perfect hashing based on binary search or hash tables (4:34)
  • Random hash functions (7:34)
  • Perfect hash function based on an acyclic graph (12:16)

Links:

  • The Sailfish paper
  • The paper describing the perfect hashing algorithm
  • The birthday paradox
  • Integrative Biology & Medicine

If you enjoyed this episode, please consider supporting the podcast on Patreon.


fyyd: Podcast Search Engine
share








 August 5, 2017  22m