Software Engineering Daily

Technical interviews about software topics.

https://softwareengineeringdaily.com/

subscribe
share






Dask: Scalable Python with Matthew Rocklin


Python is the most widely used language for data science, and there are several libraries that are commonly used by Python data scientists including Numpy, Pandas, and scikit-learn. These libraries improve the user experience of a Python data scientist by giving them access to high level APIs.

Data science is often performed over huge datasets, and the data structures that are instantiated with those datasets need to be spread across multiple machines. To manage large distributed datasets, a library such as scikit-learn can use a system called Dask. Dask allows the instantiation of data structures such as a Dask dataframe or a Dask array.

Matthew Rocklin is the creator of Dask. He joins the show to talk about distributed computing with Dask, its use cases, and the Python ecosystem. He also provides a detailed comparison between Dask and Spark, which is also used for distributed data science.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post Dask: Scalable Python with Matthew Rocklin appeared first on Software Engineering Daily.


fyyd: Podcast Search Engine
share








 April 27, 2020  1h1m