33: Katharine Jarmul - Testing in Data Science

Episode 33 · November 30th, 2017 · 37 mins 14 secs

About this Episode

A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

  • experimentation vs testing
  • testing pipelines and pipeline changes
  • automating data validation
  • property based testing
  • schema validation and detecting schema changes
  • using unit test techniques to test data pipeline stages
  • testing nodes and transitions in DAGs
  • testing expected and unexpected data
  • missing data and non-signals
  • corrupting a dataset with noise
  • fuzz testing for both data pipelines and web APIs
  • datafuzz
  • hypothesis
  • testing internal interfaces
  • documenting and sharing domain expertise to build good reasonableness
  • intermediary data and stages
  • neural networks
  • speaking at conferences

Episode Links