Summary
Everyone expects data to be transmitted, processed, and updated instantly as more and more products integrate streaming data. The technology to make that possible has been around for a number of years, but the barriers to adoption have still been high due to the level of technical understanding and operational capacity that have been required to run at scale. Datastax has recently introduced a new managed offering for Pulsar workloads in the form of Astra Streaming that lowers those barriers and make stremaing workloads accessible to a wider audience. In this episode Prabhat Jha and Jonathan Ellis share the work that they have been doing to integrate streaming data into their managed Cassandra service. They explain how Pulsar is being used by their customers, the work that they have done to scale the administrative workload for multi-tenant environments, and the challenges of operating such a data intensive service at large scale. This is a fascinating conversation with a lot of useful lessons for anyone who wants to understand the operational aspects of Pulsar and the benefits that it can provide to data workloads.
AnnouncementsIntroduction
How did you get involved in the area of data management?
Can you describe what the Astra platform is and the story behind it?
How does streaming fit into your overall product vision and the needs of your customers?
What was your selection process/criteria for adopting a streaming engine to complement your existing technology investment?
What are the core use cases that you are aiming to support with Astra Streaming?
Can you describe the architecture and automation of your hosted platform for Pulsar?
What are some of the additional tools that you have added to your distribution of Pulsar to simplify operation and use?
What are some of the sharp edges that you have had to sand down as you have scaled up your usage of Pulsar?
What is the process for someone to adopt and integrate with your Astra Streaming service?
One of the capabilities that you highlight on the product page for Astra Streaming is the ability to execute machine learning workflows on data in flight. What are some of the supporting systems that are necessary to power that workflow?
What are the ways that you are engaging with and supporting the Pulsar community?
What are the most interesting, innovative, or unexpected ways that you have seen Astra used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Astra?
When is Astra the wrong choice?
What do you have planned for the future of Astra?
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Support Data Engineering Podcast