Software Engineering Daily

Technical interviews about software topics.

https://softwareengineeringdaily.com/

subscribe
share






Data Lakehouse with Michael Armbrust


A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical workflow for a data engineer is to pull data sets from this slow data lake storage into the data warehouse for faster querying.

Apache Spark is a system for fast processing of data across distributed datasets. Spark is not thought of as a data warehouse technology, but it can be used to fulfill some of the responsibilities. Delta is an open source system for a storage layer on top of a data lake. Delta integrates closely with Spark, creating a system that Databricks refers to as a “data lakehouse.”

Michael Armbrust is an engineer with Databricks. He joins the show to talk about his experience building the company, and his perspective on data engineering, as well as his work on Delta, the storage system built for the Spark ecosystem.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post Data Lakehouse with Michael Armbrust appeared first on Software Engineering Daily.


fyyd: Podcast Search Engine
share








 May 1, 2020  59m