Linear Digressions

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.

http://lineardigressions.com

subscribe
share






Data Contamination


Supervised machine learning assumes that the features and labels used for building a classifier are isolated from each other--basically, that you can't cheat by peeking. Turns out this can be easier said than done. In this episode, we'll talk about the many (and diverse!) cases where label information contaminates features, ruining data science competitions along the way. Relevant links: https://www.researchgate.net/profile/Claudia_Perlich/publication/221653692_Leakage_in_data_mining_Formulation_detection_and_avoidance/links/54418bb80cf2a6a049a5a0ca.pdf


fyyd: Podcast Search Engine
share








 May 2, 2016  20m