Software Engineering Daily

Technical interviews about software topics.


Data Warehouse ETL with Matthew Scullion

A data warehouse provides low latency access to large volumes of data. 

A data warehouse is a crucial piece of infrastructure for a large company, because it can be used to answer complex questions involving a large number of data points. But a data warehouse usually cannot hold all of a company’s data at any given time. Users need to move a subset of the data into the data warehouse by reading large files from a data lake on disk and putting that data into the data warehouse.

The process of moving data from one place into another is broken down into three sequential steps, often called “ETL” (extract, transform, load) or “ELT” (extract, load, transform). In ETL, the data is extracted from a source such as a data lake, transformed into a schema that is customized for the data warehouse application, and then loaded into the data warehouse. In ELT, the last two steps are reversed, because modern systems can often leave the necessary schema transformation until after the data has been loaded into the data warehouse.

Matthew Scullion is the CEO of Matillion, a company that specializes in building tools for data transformations. Matthew joins the show to talk about the problem of data transformation, and how that problem has evolved over the nine years since he started Matillion.

If you enjoy the show, you can find all of our past episodes about data infrastructure by going to and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Sponsorship inquiries:

The post Data Warehouse ETL with Matthew Scullion appeared first on Software Engineering Daily.

fyyd: Podcast Search Engine

 February 14, 2020  57m