Buzzfeed needs to be able to understand how its users are interacting with the myriad articles, videos, etc. that they are posting. This lets them produce new content that will continue to be well-received. To surface the insights that they need to grow their business they need a robust data infrastructure to reliably capture all of those interactions...

November 14, 2017 43m

Astronomer with Ry Walker - Episode 6

Summary

Building a data pipeline that is reliable and flexible is a difficult task, especially when you have a small team. Astronomer is a platform that lets you skip straight to processing your valuable business data. Ry Walker, the CEO of Astronomer, explains how the company got started, how the platform works, and their commitment to open source...

August 6, 2017 42m

Rebuilding Yelp's Data Pipeline with Justin Cunningham - Episode 5

Summary

Yelp needs to be able to consume and process all of the user interactions that happen in their platform in as close to real-time as possible...

June 18, 2017 42m

ScyllaDB with Eyal Gutkind - Episode 4

Summary

If you like the features of Cassandra DB but wish it ran faster with fewer resources then ScyllaDB is the answer you have been looking for. In this episode Eyal Gutkind explains how Scylla was created and how it differentiates itself in the crowded database market.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure
Go to dataengineeringpodcast...

March 18, 2017 35m

Defining Data Engineering with Maxime Beauchemin - Episode 3

Summary

What exactly is data engineering? How has it evolved in recent years and where is it going? How do you get started in the field? In this episode, Maxime Beauchemin joins me to discuss these questions and more.

Transcript provided by CastSource

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure
Go to dataengineeringpodcast...

March 5, 2017 45m

Dask with Matthew Rocklin - Episode 2

Summary

There is a vast constellation of tools and platforms for processing and analyzing your data. In this episode Matthew Rocklin talks about how Dask fills the gap between a task oriented workflow tool and an in memory processing framework, and how it brings the power of Python to bear on the problem of big data...

January 22, 2017 46m

Pachyderm with Daniel Whitenack - Episode 1

Summary

Do you wish that you could track the changes in your data the same way that you track the changes in your code? Pachyderm is a platform for building a data lake with a versioned file system. It also lets you use whatever languages you want to run your analysis with its container based task graph...

January 14, 2017 44m

Introducing The Show

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure
Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
You can help support the show by checking out the Patreon page which is linked from the site...

January 8, 2017 4m