Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Eine durchschnittliche Folge dieses Podcasts dauert 53m. Bisher sind 322 Folge(n) erschienen. Dieser Podcast erscheint wöchentlich.

Gesamtlänge aller Episoden: 11 days 9 hours 38 minutes


episode 316: Bringing Automation To Data Labeling For Machine Learning With Watchful

Data engineers have typically left the process of data labeling to data scientists or other roles because of its nature as a manual and process heavy undertaking, focusing instead on building automation and repeatable systems. Watchful is a platform to make labeling a repeatable and scalable process that relies on codifying domain expertise...



episode 315: Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

An interview with Shinji Kim about the challenges of collecting contextual metadata for your information assets and how to organize it to power effective data discovery for everyone in the business



episode 314: Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab

An interview with Paolo Platter about the experience that he and his team at AgileLab have had implementing Data Mesh strategies at multiple organizations and the repeatable patterns that they have built into their Data Mesh Boost product.



episode 313: Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

An interview with Frank Liu about the open source vector database Milvus and how its native storage of vector embeddings reduces the friction involved in building and deploying machine learning models.



episode 312: What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

An interview with Ernie Ostic about the Manta platform and how it approaches the collection and processing of metadata to build a comprehensive view of data lineage across your various data systems



episode 311: Interactive Exploratory Data Analysis On Petabyte Scale Data Sets With Arkouda

An interview with David Bader about the Arkouda framework for exploratory data analysis at interactive speeds across massive data sets and how it supports operating from a single laptop to multiple servers in the cloud or thousands of cores on a supercomputer



episode 310: Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering

An interview with Joe Reis and Matt Housley about their experience and insights gained while writing the book "Fundamentals of Data Engineering" and the inherent challenges of offering a single reference that covers the variety of skills necessary to work as a data engineer.



episode 309: Re-Bundling The Data Stack With Data Orchestration And Software Defined Assets Using Dagster

An interview with Nick Schrock about the role of the data orchestration engine in making sense of the modern data stack and how Dagster's support for software defined assets simplifies the work of building and understanding the flow of data in your platform.



episode 308: Making The Total Cost Of Ownership For External Data Manageable With Crux

An interview with Mark Etherington, CTO of Crux, about the cost and complexity involved in external data integration and how their platform is engineered to make it manageable for organizations of all sizes


 2022-07-18  1h7m

episode 307: Joe Reis Turns The Tables And Interviews Tobias Macey About The Data Engineering Podcast

Joe Reis takes over the show and interviews Tobias Macey, host of the Data Engineering Podcast, about his own show and the other projects that keep him busy


 2022-07-18  56m