Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

https://www.dataengineeringpodcast.com

Eine durchschnittliche Folge dieses Podcasts dauert 53m. Bisher sind 428 Folge(n) erschienen. Dieser Podcast erscheint wöchentlich.

Gesamtlänge aller Episoden: 15 days 22 hours 58 minutes

subscribe
share






Designing A Non-Relational Database Engine


Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.


share








   1h16m
 
 

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer


Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem...


share








   56m
 
 

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary


Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow...


share








   50m
 
 

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+


A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features...


share








   55m
 
 

Reconciling The Data In Your Databases With Datafold


A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.


share








 March 17, 2024  58m
 
 

Version Your Data Lakehouse Like Your Software With Nessie


Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond that simple utility...


share








 March 10, 2024  40m
 
 

When And How To Conduct An AI Program


Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a reality there is a substantial amount of strategy and investment required. Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization.


share








 March 3, 2024  46m
 
 

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development


Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process...


share








 February 25, 2024  56m
 
 

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse


A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality...


share








 February 18, 2024  58m
 
 

Data Sharing Across Business And Platform Boundaries


Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms...


share








 February 12, 2024  59m