Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

https://www.dataengineeringpodcast.com

Eine durchschnittliche Folge dieses Podcasts dauert 53m. Bisher sind 429 Folge(n) erschienen. Dies ist ein wöchentlich erscheinender Podcast.

Gesamtlänge aller Episoden: 15 days 23 hours 52 minutes

subscribe
share






Data Sharing Across Business And Platform Boundaries


Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms...


share








 February 12, 2024  59m
 
 

Tackling Real Time Streaming Data With SQL Using RisingWave


Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.


share








 February 4, 2024  56m
 
 

Build A Data Lake For Your Security Logs With Scanner


Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.


share








 January 29, 2024  1h2m
 
 

Modern Customer Data Platform Principles


Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization...


share








 January 22, 2024  1h1m
 
 

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel


Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University...


share








 January 7, 2024  50m
 
 

Designing Data Platforms For Fintech Companies


Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.


share








 January 1, 2024  47m
 
 

Troubleshooting Kafka In Production


Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: Troubleshooting in Production"...


share








 December 24, 2023  1h14m
 
 

Adding An Easy Mode For The Modern Data Stack With 5X


The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack...


share








 December 18, 2023  56m
 
 

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack


If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project...


share








 December 11, 2023  49m
 
 

Designing Data Transfer Systems That Scale


The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud.


share








 December 4, 2023  1h3m