Gesamtlänge aller Episoden: 11 days 3 hours 4 minutes
is a public collaborative, grassroots effort around the United States in which scientists, researchers, computer scientists, librarians and other volunteers are working to download, save, and re-upload government data. The DataRefuge Project, which is...
If a CEO wants to know the state of their business, they ask their highest ranking executives. These executives, in turn, should know the state of the business through reports from their subordinates. This structure is roughly analogous to a process...
In this episode, I speak with Raghu Ramakrishnan, CTO for Data at Microsoft. We discuss services, tools, and developments in the big data sphere as well as the underlying needs that drove these innovations.
In this episode, we talk about a high-level description of deep learning. Kyle presents a simple game (pictured below), which is more of a puzzle really, to try and give Linh Da the basic concept. Thanks to our sponsor for...
Versioning isn't just for source code. Being able to track changes to data is critical for answering questions about data provenance, quality, and reproducibility. Daniel Whitenack joins me this week to talk about these concepts and share his work on...
Logistic Regression is a popular classification algorithm. In this episode, we discuss how it can be used to determine if an audio clip represents one of two given speakers. It assumes an output variable (isLinhda) is a linear combination of available...
Prior work has shown that people's response to competition is in part predicted by their gender. Understanding why and when this occurs is important in areas such as labor market outcomes. A well structured study is challenging due to numerous...
Deep learning can be prone to overfit a given problem. This is especially frustrating given how much time and computational resources are often required to converge. One technique for fighting overfitting is to use dropout. Dropout is the method of...
In this episode I speak with Clarence Wardell and Kelly Jin about their mutual service as part of the White House's Police Data Initiative and Data Driven Justice Initiative respectively. The was organized to use open data to increase transparency...
We close out 2016 with a discussion of a basic interview question which might get asked when applying for a data science job. Specifically, how a library might build a model to predict if a book will be returned late or not.