Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

https://www.dataengineeringpodcast.com

subscribe
share





episode 116: Replatforming Production Dataflows [transcript]


Summary

Building a reliable data platform is a neverending task. Even if you have a process that works for you and your business there can be unexpected events that require a change in your platform architecture. In this episode the head of data for Mayvenn shares their experience migrating an existing set of streaming workflows onto the Ascend platform after their previous vendor was acquired and changed their offering. This is an interesting discussion about the ongoing maintenance and decision making required to keep your business data up to date and accurate.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host is Tobias Macey and today I’m interviewing Sheel Choksi and Sean Knapp about Mayvenn’s experience migrating their dataflows onto the Ascend platform
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you start off by describing what Mayvenn is and give a sense of how you are using data?
  • What are the sources of data that you are working with?
  • What are the biggest challenges you are facing in collecting, processing, and analyzing your data?
  • Before adopting Ascend, what did your overall platform for data management look like?
  • What were the pain points that you were facing which led you to seek a new solution?
    • What were the selection criteria that you set forth for addressing your needs at the time?
    • What were the aspects of Ascend which were most appealing?
  • What are some of the edge cases that you have dealt with in the Ascend platform?
  • Now that you have been using Ascend for a while, what components of your previous architecture have you been able to retire?
  • Can you talk through the migration process of incorporating Ascend into your platform and any validation that you used to ensure that your data operations remained accurate and consistent?
  • How has the migration to Ascend impacted your overall capacity for processing data or integrating new sources into your analytics?
  • What are your future plans for how to use data across your organization?
Contact Info
  • Sheel
    • LinkedIn
    • sheelc on GitHub
  • Sean
    • LinkedIn
    • @seanknapp on Twitter
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
Links
  • Mayvenn
  • Ascend
    • Podcast Episode
  • Google Sawzall
  • Clickstream
  • Apache Kafka
  • Alooma
    • Podcast Episode
  • Amazon Redshift
  • ELT == Extract, Load, Transform
  • DBT
    • Podcast Episode
  • Amazon Data Pipeline
  • Upsolver
  • Pentaho
  • Stitch Data
  • Fivetran
    • Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA


share







 2020-01-20  39m