Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

https://www.dataengineeringpodcast.com

subscribe
share





episode 115: Planet Scale SQL For The New Generation Of Applications [transcript]


Summary

The modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of deployment. This has led to a class of applications that can quickly scale to serve users worldwide. This requires a new class of data storage which can accomodate that demand without having to rearchitect your system at each level of growth. YugabyteDB is an open source database designed to support planet scale workloads with high data density and full ACID compliance. In this episode Karthik Ranganathan explains how Yugabyte is architected, their motivations for being fully open source, and how they simplify the process of scaling your application from greenfield to global.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host is Tobias Macey and today I’m interviewing Karthik Ranganathan about YugabyteDB, the open source, high-performance distributed SQL database for global, internet-scale apps.
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by describing what YugabyteDB is and its origin story?
  • A growing trend in database engines (e.g. FaunaDB, CockroachDB) has been an out of the box focus on global distribution. Why is that important and how does it work in Yugabyte?
    • What are the caveats?
  • What are the most notable features of YugabyteDB that would lead someone to choose it over any of the myriad other options?
    • What are the use cases that it is uniquely suited to?
  • What are some of the systems or architecture patterns that can be replaced with Yugabyte?
  • How does the design of Yugabyte or the different ways it is being used influence the way that users should think about modeling their data?
  • Yugabyte is an impressive piece of engineering. Can you talk through the major design elements and how it is implemented?
  • Easy scaling and failover is a feature that many database engines would like to be able to claim. What are the difficult elements that prevent them from implementing that capability as a standard practice?
    • What do you have to sacrifice in order to support the level of scale and fault tolerance that you provide?
  • Speaking of scaling, there are many ways to define that term, from vertical scaling of storage or compute, to horizontal scaling of compute, to scaling of reads and writes. What are the primary scaling factors that you focus on in Yugabyte?
  • How do you approach testing and validation of the code given the complexity of the system that you are building?
  • In terms of the query API you have support for a Postgres compatible SQL dialect as well as a Cassandra based syntax. What are the benefits of targeting compatibility with those platforms?
    • What are the challenges and benefits of maintaining compatibility with those other platforms?
  • Can you describe how the storage layer is implemented and the division between the different query formats?
  • What are the operational characteristics of YugabyteDB?
    • What are the complexities or edge cases that users should be aware of when planning a deployment?
  • One of the challenges of working with large volumes of data is creating and maintaining backups. How does Yugabyte handle that problem?
  • Most open source infrastructure projects that are backed by a business withhold various "enterprise" features such as backups and change data capture as a means of driving revenue. Can you talk through your motivation for releasing those capabilities as open source?
  • What is the business model that you are using for YugabyteDB and how does it differ from the tribal knowledge of how open source companies generally work?
  • What are some of the most interesting, innovative, or unexpected ways that you have seen yugabyte used?
  • When is Yugabyte the wrong choice?
  • What do you have planned for the future of the technical and business aspects of Yugabyte?
Contact Info
  • @karthikr on Twitter
  • LinkedIn
  • rkarthik007 on GitHub
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
Links
  • YugabyteDB
    • GitHub
  • Nutanix
  • Facebook Engineering
  • Apache Cassandra
  • Apache HBase
  • Delphi
  • FuanaDB
    • Podcast Episode
  • CockroachDB
    • Podcast Episode
  • HA == High Availability
  • Oracle
  • Microsoft SQL Server
  • PostgreSQL
    • Podcast Episode
  • MongoDB
  • Amazon Aurora
  • PGCrypto
  • PostGIS
  • pl/pgsql
  • Foreign Data Wrappers
  • PipelineDB
    • Podcast Episode
  • Citus
    • Podcast Episode
  • Jepsen Testing
  • Yugabyte Jepsen Test Results
  • OLTP == Online Transaction Processing
  • OLAP == Online Analytical Processing
  • DocDB
  • Google Spanner
  • Google BigTable
  • Spot Instances
  • Kubernetes
  • Cloudformation
  • Terraform
  • Prometheus
  • Debezium
    • Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA


share







 2020-01-13  1h1m