Gesamtlänge aller Episoden: 11 days 5 hours 14 minutes
Deep learning can be prone to overfit a given problem. This is especially frustrating given how much time and computational resources are often required to converge. One technique for fighting overfitting is to use dropout. Dropout is the method of...
In this episode I speak with Clarence Wardell and Kelly Jin about their mutual service as part of the White House's Police Data Initiative and Data Driven Justice Initiative respectively. The was organized to use open data to increase transparency...
We close out 2016 with a discussion of a basic interview question which might get asked when applying for a data science job. Specifically, how a library might build a model to predict if a book will be returned late or not.
Today's episode is a reading of Isaac Asimov's . As mentioned on the show, this is just a work of fiction to be enjoyed and not in any way some obfuscated political statement. Enjoy, and happy holidays!
Classically, entropy is a measure of disorder in a system. From a statistical perspective, it is more useful to say it's a measure of the unpredictability of the system. In this episode we discuss how information reduces the entropy in deciding...
Cloud services are now ubiquitous in data science and more broadly in technology as well. This week, I speak to , , and about various aspects of data at scale. We discuss the embedding of R into SQLServer, SQLServer on linux, open source, and a few...
Today's episode is all about Causal Impact, a technique for estimating the impact of a particular event on a time series. We talk to about his research into the impact releases have on app and we also chat with about a project she helped us build to...
The Bootstrap is a method of resampling a dataset to possibly refine it's accuracy and produce useful metrics on the result. The bootstrap is a useful statistical technique and is leveraged in Bagging (bootstrap aggregation) algorithms such as Random...
The Gini Coefficient (as it relates to decision trees) is one approach to determining the optimal decision to introduce which splits your dataset as part of a decision tree. To pick the right feature to split on, it considers the frequency of the...
Financial analysis techniques for studying numeric, well structured data are very mature. While using unstructured data in finance is not necessarily a new idea, the area is still very greenfield. On this episode, shares her thoughts on the potential...