Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

https://www.dataengineeringpodcast.com

Eine durchschnittliche Folge dieses Podcasts dauert 53m. Bisher sind 430 Folge(n) erschienen. Dieser Podcast erscheint wöchentlich.

Gesamtlänge aller Episoden: 16 days 42 minutes

subscribe
share






Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future


Data has been one of the most substantial drivers of business and economic value for the past few decades. Bob Muglia has had a front-row seat to many of the major shifts driven by technology over his career. In his recent book "Datapreneurs" he reflects on the people and businesses that he has known and worked with and how they relied on data to deliver valuable services and drive meaningful change.


share








 July 17, 2023  54m
 
 

Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling


For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.


share








 July 10, 2023  1h12m
 
 

How Data Engineering Teams Power Machine Learning With Feature Platforms


Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features.


share








 July 3, 2023  1h3m
 
 

How Data Engineering Teams Power Machine Learning With Feature Platforms


Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features.


share








 July 3, 2023  46m
 
 

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh


Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects...


share








 June 26, 2023  50m
 
 

How Column-Aware Development Tooling Yields Better Data Models


Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design...


share








 June 18, 2023  46m
 
 

Build Better Tests For Your dbt Projects With Datafold And data-diff


Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct...


share








 June 12, 2023  48m
 
 

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service


A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, as well as providing the insights that you need to manage the human side of the workflow...


share








 June 5, 2023  54m
 
 

A Roadmap To Bootstrapping The Data Team At Your Startup


Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth.


share








 May 29, 2023  42m
 
 

Keep Your Data Lake Fresh With Real Time Streams Using Estuary


Batch vs. streaming is a long running debate in the world of data integration and transformation. Proponents of the streaming paradigm argue that stream processing engines can easily handle batched workloads, but the reverse isn't true. The batch world has been the default for years because of the complexities of running a reliable streaming system at scale...


share








 May 22, 2023  55m