Python Test

Practical automated testing for software engineers using Python. Mostly. But also so much more.

https://podcast.pythontest.com

subscribe
share






episode 33: 33: Katharine Jarmul - Testing in Data Science


A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

  • experimentation vs testing
  • testing pipelines and pipeline changes
  • automating data validation
  • property based testing
  • schema validation and detecting schema changes
  • using unit test techniques to test data pipeline stages
  • testing nodes and transitions in DAGs
  • testing expected and unexpected data
  • missing data and non-signals
  • corrupting a dataset with noise
  • fuzz testing for both data pipelines and web APIs
  • datafuzz
  • hypothesis
  • testing internal interfaces
  • documenting and sharing domain expertise to build good reasonableness
  • intermediary data and stages
  • neural networks
  • speaking at conferences

Special Guest: Katharine Jarmul.

Sponsored By:

  • Python Testing with pytest, 2nd edition: The fastest way to learn pytest and practical testing practices.
  • Patreon Supporters: Help support the show with as little as $1 per month and be the first to know when new episodes come out.

Links:

  • @kjam on Twitter — Data Magic and Computer Sorcery
  • Kjamistan: Data Science
  • datafuzz’s Python library — The goal of datafuzz is to give you the ability to test your data science code and models with BAD data.
  • Hypothesis Python library — Hypothesis is a Python library for finding edge cases in your code you wouldn’t have thought to look for.


fyyd: Podcast Search Engine
share








 November 30, 2017  37m