search for: The Green Knight
10 results (5.598 seconds)
group by podcast
Wie oft wurde "The Green Knight" in Episodentexten gefunden?

podcasts

Wenn Du in Zukunft über neue Folgen zum Suchbegriff The Green Knight informiert werden möchtest, lege einfach einen Alert dafür an.

search results

     
     
  •  
    Recommended Podcast: Dog Cancer Answers about Tripawds
    2020-08-22 (duration 1m)
    [transcript]
    00:03 the
    00:16 the
    00:26 the
     
    ELIAS Osteosarcoma Immunotherapy Vaccine
    2019-03-17 (duration 16m)
    [transcript]
    00:17 the
    00:25 the
    01:07 The
  •  
     
  •  
    Exploring The TileDB Universal Data Engine
    2020-08-17 (duration 1h5m)
    [transcript]
    54:07 in terms of your own experience of building and growing the project and the business around tile dB, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
    28:40 And for somebody who is going to start using tile dB, both for the embedded and for the cloud use case, what is the overall workflow look like? And what are some of the benefits that you're seeing of unbundling the storage layer from the computation for being able to integrate face with that storage engine from multiple different libraries and runtimes.
    51:03 And then the other interesting element of this is the fact that the tile DB embedded project is open source and publicly available for free. And then you're also building a company around that and the cloud service on top of it. So I'm curious how you're managing governance and ongoing sustainability of the open source aspects of the project. And the tensions of trying to be able to build a profitable business on top of that
     
    Build More Reliable Distributed Systems By Breaking Them With Jepsen
    2020-07-28 (duration 49m)
    [transcript]
    37:06 a link to the paper in the show notes.
    16:12 And because of the fact that the primary focus of Jepson is on the distributed systems guarantees another level of complexity and getting the system set up properly is dealing with things like encryption and authentication and authorization. I'm assuming that you just completely leave that to the side so that you can focus on the durability and serializability guarantees of the system.
    17:24 And then for the actual process of building the deployment and running the overall workflow of setting up the test suite, executing it and then evaluating the output, can you just talk through that whole process and some of the challenges that you face particularly at the end in terms of interpreting the results and being able to provide useful feedback to the people who are building these systems and engaging with you to find errors that they need to correct?
     
    Open Source Production Grade Data Integration With Meltano
    2020-07-13
    [transcript]
    02:38 given that you're so new to this area, what are some of the aspects of the learning curve that you've been running into as you get ramped up on the project and the use case that it fills and some of the challenges within the overall ecosystem that you're trying to tackle?
    41:01 and digging into the specifics of how Mel Tano is implemented in its current incarnation. I'm wondering if you can just describe the overall architecture and some of the ways that it has evolved from the original direction where it was trying to be this all encompassing tool that included the entirety of the lifecycle.
    1:01:40 of your work on Mel Tano, or the overall space of data integration or some of the challenges in an end to end tool for managing the data lifecycle that we didn't discuss that you'd like to cover before we close out the show?
     
    DataOps For Streaming Systems With Lenses.io
    2020-07-06 (duration 45m)
    [transcript]
    41:19 And as you look to the future of the lenses platform, and the streaming ecosystem in general, and the data ops principles that you're trying to operate around, what are some of the things that you have planned for the future of both the technical and business aspects of what you're working on?
    29:14 definitely an interesting trend that I've seen in a few different places have different technology implementations, adopting the API of the dominant worker in the space, one of the notable ones being s3, where all the different object storage is are adding a compatibility layer in the Python ecosystem. A lot of different projects are adopting the NumPy API while swapping out some of the specifics of the processing layer. And then in the streaming space, they're working on coalescing around Kafka, but they're also working on the open streaming specification to try and consolidate the specifics of how you work with those systems so that they can innovate on the specifics of how that system actually functions under the hood.
    07:15 You mentioned the fact that you have your own custom sequel engine. I'm wondering what the motivation was for building that out specifically for this product versus using some of the off the shelf engines that are wholesale available for some of these streaming platforms or leveraging some of the components that exist out there, such as the calcite project in the Apache ecosystem.
     
    Data Collection And Management For Teaching Machines To Hear At Audio Analytic
    2020-06-30 (duration 57m)
    [transcript]
    21:09 right through the, you know, data collection processing, labeling, or mentation, training, evaluation, you know, even sometimes data compression and deployment levels so that you know that it's doing a good job in terms of frameworks for doing that. No, no off the shelf frameworks that that's a completely new area itself. And yeah, with audio data, there's definitely a huge degree of variability along a number of different axes where as you said, You've got your anechoic chambers for being able to isolate the sound to the specific piece that you're trying to collect. But then out in the real world that's going to often be overlaid with whatever the other background noise is, whether it's the you know, hum of your washing machine in the next room or the sounds of engines going by outside and then being able to isolate that sound and then for your volunteers who are Contributing the audio that you're using for this collection process, I imagine that there's variability in terms of the quality of the microphones that they're using the sample rates, that they're collecting the audio and the specifics of the audio format, that it's being collected the length of the segments, I'm wondering how you approach being able to try and encapsulate all of that variability and be able to standardize it in some way for being able to feed it through your model training process in January, right? We think about it in terms of subject matter variability, and channel variability, with channel variability split into two parts, which is sort of a acoustic coupling variability, which would be the environment you're in the acoustics of it, is it the reverberant environment is in the bathroom or is it you know, sort of in in the hallway, and then you've actually got the the, the actual device channel variability, which includes the microphone includes all of the brass parts of the audio subsystem before the input audio is received. Ai three, which is the, the inference engine that we run to do the high quality, sound recognition we do on the device. Tom, in terms of the challenges and all that, if you want to pick that piece up,
    08:11 it we obviously looked at when we started a whole range of different applications of the technology, sort of a foundational technology. And in that respect, yes, we looked at what the area described might be, for me would be called predictive maintenance or something of that nature. The commercial activity of the companies is largely focused towards consumer electronics, it's where we've had the most success commercially wide scale, you know, adoption. So that that's the bulk of the commercial effort that then obviously translates into the thrust of the sounds were detecting it most of this world can be described in terms of breaking it down in terms of the type of sounds so obviously, the, the sounds you'd get in a production plant, I think is the example you'd use would be very different than the sorts of Sounds you and I would care about in our house or if we're active out on the street, they do stand up being very different sounds. And we capture that in in terms of the taxonomy that we use to structure our data.
    56:42 Listening. Don't forget to check out our other show podcast.in it at python podcast.com. To learn about the Python language, its community in the innovative ways it is being used, and visit the site at data engineering podcast.com Subscribe to the show. Sign up for the mailing list and read the show notes. If you've learned something or Write out a project from the show and tell us about it. Email hosts EPS data engineering podcast.com with your story, and to help other people find the show, please leave a review on iTunes and tell your friends and coworkers
     
     
    Bringing Business Analytics To End Users With GoodData
    2020-06-23 (duration 52m)
    [transcript]
    36:28 And then the other component of building the data product and the perspective of the customers in terms of working with their data is I'm curious how the overall lifecycle of the data flows through the good data product from when the customer first collects the data through to delivering it to their end users and ensuring that the overall experience is as performant and robust as possible.
    15:46 So thanks, Tobias, for mentioning the API's. That's definitely something that our end users are leveraging, especially from the developer aspect of people that are integrating the good data platform into their own. analytics. And that's something that we leverage through embedded analytics and the ways were able to embed the good data platform into the clients products or their own application is in three different ways. The very first way is just through like direct embedding the iframe, where you're getting the good data reports directly into the clients app or platform. And that isn't utilizing the good data API's and other ways just to embed the link that is directly linked to the white label good data portal. And then the third way is the good data UI, or gd.ui, which is the React based development library, allowing developers to seamlessly integrate into good data with their product. So combined with something that we developed called the accelerator toolkit. This pretty much streamlines the front end development efforts, so that there's a lot of custom visualizations and integration into the customers app.
    29:02 Absolutely. So what you're referring to is what pieces of the good data architecture what the client wants to leverage. So we're talking about the data warehousing piece that Phil was talking about with ADF, where we could potentially get the aggregation of lots of different data sources all in one place. Whether or not that's something that the client wants to leverage from the good data side or own on their side. There's also the loading mechanism, the ATP piece, where we're talking about how the client would be able to load that data, whether or not they want to keep it on their side or actually keep it on the good data side. So the ways we're able to manage that is really the flexibility of the types of sources. We're able to download from whether or not we are doing the transformations or just loading directly into the platform. So with all the connectors that we have with these pre packaged Ruby bricks that are leveraging the good data API's as well as the source API's, were able to integrate their data and load into ADF through those connectors, or if the client wants to own, a lot of the transformations themselves, match the exact metadata output for the semantic layer or the models that are on the workspaces. They're able to load that directly in with their data warehousing source through our automated data distribution or add, especially if they're using things like snowflake, redshift, or BigQuery.
     
    Accelerate Your Machine Learning With The StreamSQL Feature Store
    2020-06-15 (duration 46m)
    [transcript]
    17:10 And in the discovery piece, what are some of the useful pieces of metadata that should be mapped to a given feature? And what are the options for people defining the features for being able to define that metadata in terms of the structure and content?
    20:38 And how is the underlying architecture of the feature store implemented as far as being able to pull in the data and integrate it and then being able to create and store the features for being able to be served up
    31:24 certain extra data to a stream. And then as far as the materialization of the features, how are those stored and how do you handle updates to those and being able to keep track of the different versions in the materialized locations?
     
    Data Management Trends From An Investor Perspective
    2020-06-08 (duration 54m)
    [transcript]
    26:30 Another piece that isn't specifically a data catalog, but that I was impressed by who I spoke with a while ago were the folks behind the Marquez project out of we work as a means of being able to have useful integration points for automatically populating the metadata information and being able to visualize the overall lineage of the processes that produce the end results.
    54:11 Listening Don't forget to check out our other show podcast dotnet at Python podcast comm to learn about the Python language its community in the innovative ways it is being used, and visit the site at data engineering podcast calm to subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried other projects from the show, then tell us about it. Email hosts at data engineering podcast.com with your story, and to help other people find the show. Please leave a review on iTunes and tell your friends and co workers
    16:55 And so you also read an article recently that was highlighting the four main trends that you're keeping an eye on for 2020 in the data space, and those call out in particular, the elements of data quality data catalogues observability of the influences for critical business indicators or KPIs and streaming data. So taking those in turn, starting with the data quality aspect, what are some of the driving factors that influence that quality? And what elements of that problem space are being addressed by the companies that you're watching?
     
    Building A Data Lake For The Database Administrator At Upsolver
    2020-06-02 (duration 56m)
    [transcript]
    47:47 And as you plan for the future of the business and the look to the current trends in the industry for data lake technologies and usages of data lakes, what do you have planned for your roadmap
    45:06 does it in terms of your experience of building the solver platform and democratizing it for the DBA to be able to handle the data lake and just growing the overall business and technical elements of the company? What are some of the most interesting or unexpected or challenging lessons that you've learned in the process of giving to the DBA just overall of building up solver?
    11:15 And so in that same time span, roughly the past two years, in addition to the changes in database technologies, and its overall adoption, what are the ways that the up solver platform has changed or evolved since we spoke, and how has the evolution of those underlying technologies impacted your strategy for implementation and the features that you decided to include?
     
    Mapping The Customer Journey For B2B Companies At Dreamdata
    2020-05-26 (duration 46m)
    [transcript]
    07:10 So the company was born out of the frustration of trying to gain visibility about all of the different interactions of customers and how that fed into the overall success of the business that you are in. But what is it that's keeping you motivated as you continue to build out and grow the capabilities and dream data?
    43:57 And are there any other areas of the world work that you're doing at dream data, or the overall space of b2b sales and revenue tracking or any of the other challenges that you're facing in the data landscape that we didn't discuss, they'd like to cover before we close out the show.
    12:58 yeah, that manner and you You mentioned to that there are often silos that occur between some of the different responsibilities of people who are interacting with the customers at different points in the interaction cycle. And I'm wondering what you have found to be some of the contributing factors that give rise to those different silos and the challenges that that poses in terms of being able to effectively map the journey of the customer through all those different interaction points to the point where they're actually paying you money.
     
    Power Up Your PostgreSQL Analytics With Swarm64
    2020-05-18 (duration 52m)
    [transcript]
    46:44 And then as you look to the near and medium term of the business and the technologies and the evolution of the postgrads ecosystem, what are some of the things that you have planned?
    19:59 and in Increasing the processing throughput of the database can be beneficial for things that are big that are compute intensive, like being able to parallelize the queries. But how does that shift the overall bottlenecks and impact the disk IO in terms of the overall throughput of the database?
    17:19 And then because of the fact that they're able to get this improved performance out of their existing Postgres database, it removes the necessity to do a lot of the data copying and can simplify the overall system design. And I'm wondering what are some of the other benefits that can be realized by keeping all of your data in the one database engine, but what are some of the challenges that it poses as well, particularly in the areas of doing things like data modeling within the database or for the data warehousing use case? Being able to generate some of the history tables so that you can capture changes over time and things like that?
     
    StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar
    2020-05-11
    [transcript]
    15:30 Yeah, just sort of how some of the recent trends in the overall data industry have influenced the decisions around pulsar and the direction that it's taken in the past two years since I last had it on the podcast.
    49:18 seen in the community and as you look to the future of the business for stream native and the pulsar project and community what do you have planned and what are your goals?
    13:02 are kind of driven by the use case driven by the adoption of the community. And what are some of the other characteristics of the community that has grown up around pulsar that you would see as being distinct from some of the other streaming systems that are being used by people.
     
    Enterprise Data Operations And Orchestration At Infoworks
    2020-05-04 (duration 45m)
    [transcript]
    15:40 and what are some of the key principles on which the infraworks platform is built that have guided your development and improvements of the overall capabilities of the system?
    10:58 So one of the traps that we see is and so the, the other thing that a number of the companies have already built is, you know, sort of the Do It Yourself platform for this, especially the new technologies, like the, you know, the Hadoop and Spark based systems, one of the challenges that they face is, you know, the continuing investment of engineering in maintaining those, you know, do to yourself platforms. And, you know, some of the duties of platforms are essentially built on point tools with a lot of glue code. So when maintenance becomes a challenge, many of the enterprises, you know, we have worked with them, they have come to us where we have successfully replaced that in house, do it yourself platform with a with an infoblox platform that has, you know, given them the agility to run their organization. So that's, that's one of the sort of successful journey for an enterprise.
    11:51 And so for the organizations that are integrating and fireworks into their system, what are some of the tools or technologies that they might be replacing And what is the process of actually integrating the infoblox platform into their data technologies that are already running?
     
    Building Real Time Applications On Streaming Data With Eventador
    2020-04-20 (duration 50m)
    [transcript]
    01:46 Yeah, the early days of the web. were definitely an interesting time for actually finding out what the limitations are the systems that were available at the time.
    41:31 Yeah, the the universal answer to any technical question. It depends.
    10:53 And in terms of the main use cases that you're seeing people using for the streaming sequel applications you mentioned the outset the sort of location based and very real time nature of the application that one of your first customers was looking to build, I'm wondering what are some of the other ways that you've been seeing the platform used and how it fits into the overall application architecture and how people are reconsidering the ways that they build and deploy the types of applications they're building on these event streams?
     
    Making Data Collection In Your Code Easy With Rookout
    2020-04-14 (duration 26m)
    [transcript]
    13:10 for the full lifecycle of the data collection piece. Once you have defined something in the lookout panel, is it then something that needs to be incorporated back into the code by a developer to ensure that it's included going forward? Or how do you manage the overall lifecycle of these collection points? And what's the interaction between the stakeholders and the engineers for defining what the useful context and what the useful lifespan is of that collection? point? Where is it something that's generally a one off where somebody is just doing some quick sampling of the data? Or is it often something where they discover an additional data point that they need and then they want to collect that further prior to the future for the full duration of the application? That's
    12:18 So row count is built on a multiple services. The first resides in the customer's application, and is essentially an SDK that allows you to extract the data on the fly from the relevant portions of the app. The second service is an ETL component, written in golang for efficiency, and this second service is in charge of the ETL process. It takes the raw data reduction rates based on security policies, transforms it to the relevant target format, whether it's JSON or XML, or just a string and then sent it out to the final target in the most efficient and simple way possible, and the entire process is orchestrated. From a single pane of glass, they record service. And this allows you to keep everything in check and implement various organizational and operational policies on the process end to end
    21:59 and are there Any other aspects of the process of collecting these metrics and information from the software that we're running, or the value that can be obtained from the information that's hiding in those systems, or the overall process of leveraging dark data in an organization that we didn't discuss yet that you'd like to cover before we close out the show or any other aspects of the work that you're doing a workout?
     
    Building A Knowledge Graph Of Commercial Real Estate At Cherre
    2020-04-07 (duration 45m)
    [transcript]
    34:18 Looking to the near and medium term, what are some of the improvements or enhancements that you have planned to the actual content of the Knowledge Graph itself, or the pipeline and tooling that you have to be able to build and power the graph.
    11:11 And another issue of being able to build this type of data store and do the entity extraction and resolution and being able to establish the edges between the different nodes within the graph. There are a lot of challenges because of the fact that with pulling from multiple different data sources, I'm sure they don't all have the same representations and a sort of common schema or format. And I'm wondering what the main sources of messiness are for this data set and some of the approaches that you're using to be able to clean and normalize the information.
    05:31 And then in terms of the end users of the Knowledge Graph and the analysis being performed. I'm wondering if you can give a bit of a flavor of the interactions and the types of questions that are being asked of that knowledge graph and some of the challenges that you face in terms of being able to expose that underlying graph in a way that's intuitive and easy to use.
     
    The Life Of A Non-Profit Data Professional
    2020-03-31 (duration 44m)
    [transcript]
    10:48 Another element of nonprofits that is less pronounced in for profit institutions is the need to very closely align the work that you're doing with the value But it's going to produce to ensure that you're not wasting cycles on something that might be technically elegant and useful. But it's not necessarily going to give the immediate impact that's necessary to ensure that you're meeting the mission and the specific financial needs of the organization that you're working with. And I'm curious how that manifests in terms of the ways that you approach the technical and design decisions of your work. And any of the aspects of the build versus buy dichotomy in terms of how you're building things out,
    27:37 In addition to your current role at the NRDC. You have also taken it upon yourself to spin up a new nonprofit organization in light of the current global crisis that we're going through with the cobit 19 virus. I'm wondering if you can describe a bit of the nature of that mission and the organization that you're building up around it and some of the goals that you have That organization and how you're hoping to make an impact on the
    43:49 Listening Don't forget to check out our other show podcast.in it at Python podcast calm to learn about the Python language, its community in the innovative ways it is being used. visit the site at data engineering podcast comm to subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast comm with your story and to help other people find the show, please leave a review on iTunes and tell your friends and coworkers
     
    Behind The Scenes Of The Linode Object Storage Service
    2020-03-23 (duration 35m)
    [transcript]
    04:02 choice that we should use it. And in terms of the scale that you're building for, and the scope of usage that you had to design for wondering how that impacted the overall design and testing and implementation of the rollout of the object storage capabilities.
    05:52 And in terms of maintaining compatibility with the s3 API, I know that Sef out of the box has that capability, but What are some of the issues or challenges that you see as far as being able to keep up with some of the recent additions to the s3 API for things like the s3, select API or anything like that?
    14:11 And as you were determining the deployment of the object storage, given that you already had the soft deployments for block storage, were there any different considerations that needed to be made for the hardware that it was getting deployed to? Or is everything just homogeneous across the different block and object storage supporting infrastructure?
     
    Building A New Foundation For CouchDB
    2020-03-17 (duration 55m)
    [transcript]
    33:00 And then as far as the actual work of migrating the code base and integrating with foundation DB and figuring out what the separation of concerns are what have been some of the biggest challenges and some of the most interesting experiences that you and the other developers have had,
    32:48 Yeah, we would look at it not so much as joining to the existing cluster but just standing up as a cluster next door and setting up a replication job to sync the data from the old cluster to the new cluster and then having a load balancer over top three point the end. points to point to the new cluster.
    20:58 Yeah, the fields spacetime trade off. And another implementation detail of couchdb that I'm intrigued by is the fact that it's written at least primarily in Erlang, which I also know the React engine was written in and a few other layers such as rabbit mq. And I'm curious what you have found to be both the benefits and the challenges of that being the runtime for couchdb.
     
    Scaling Data Governance For Global Businesses With A Data Hub Architecture
    2020-03-09 (duration 54m)
    [transcript]
    49:57 and are there any other aspects of the data architecture or some of the ways that it's being used or the benefits that it provides that we didn't discuss yet do you think we should cover before we close out the show? I think
    53:21 Listening Don't forget to check out our other show podcast dotnet at Python podcast.com to learn about the Python language its community in the innovative ways it is being used and visit the site at data engineering podcast comm subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast comm with your story and to help other people find the show. Please leave a review on iTunes and tell your friends and coworkers
    33:27 And in this architecture, is it primarily just as a means of storing and communicating about and transmitting data? Or do the individual hubs also provide capacity for computation, where in the instance that we were saying of, for instance, the name where there is no way to cleanly convert between one representation to the other where you can push your analysis down to that hub to perform the computation that you want, and then return the results back to that and sort of a scatter gather approach?
     
    Easier Stream Processing On Kafka With ksqlDB
    2020-03-02 (duration 43m)
    [transcript]
    22:44 Another aspect of the sequel interface is the definition of tables and schemas and the migration of those schemas as they evolve and add new columns or rename things or change types. And so I'm wondering how that is is defined in case who the B and the different layers in the Kafka ecosystem that operate together to enforce those constraints on the data that's flowing through.
    39:14 In terms of the product roadmap for K SQL DB and the future of the overall caffee ecosystem where it is residing, what do you have planned?
    41:19 Are there any other aspects of the case equal DB project or the ways that it's being used or the overall efforts going into streaming applications that we didn't discuss yet that you'd like to cover before we close out the show?
     
    Shining A Light on Shadow IT In Data And Analytics
    2020-02-25 (duration 46m)
    [transcript]
    36:47 And are there any other just inherent complexities in the overall aspect of data management and the available technologies that you think we need to see resolved and addressed more effectively In order to reduce the tension that exists between the organizations and the different business units that lead to these bespoke solutions,
    26:32 And so one of the interesting things to explore as well is that there are these tensions that exist between the priorities of the different groups within the business and the different projects that get spawned as a result, but once you have identified or somebody has introduced some new tool and presented it to the rest of the organization, what are some of the useful strategies for removing the friction that exists in the organization that causes them to go out and build those new tools in the first place, or maybe try to hide them? And how do you incorporate those new platforms into the organization and make it easy to integrate or extend the services that are available to make it so that you maybe use a different compute framework, but you're not trying to reinvent the definition of a particular metric, and you're able to rely on some of the Master Data Management or compliance and governance strategies that exist without them being too rigid?
    12:57 Yeah, or I mean, one of the biggest Is which metric is the right metric? You know, I mean, 10 people can run the same pipeline and call the outcome, the same number, and you could have 10 different numbers. Right. And so, you know, at least early in the transition for a lot of companies, you know, he, he who owns the, the metric, you know, owns the story. And so every individual would want to come into a C staff meeting with their own set of metrics, for example, you know, so how do you from the top down start saying, Look, how do we drive standardization without squelching innovation? And so the stuff that that Sean's talking about around metadata around, being able to have visibility into the pipelines, being able to rank and canonized certain data sets and certain metrics, those are the key things that allow success in a data product or in a data pipeline. within an organization?
     
    Data Infrastructure Automation For Private SaaS At Snowplow
    2020-02-18 (duration 49m)
    [transcript]
    42:57 and what's in store for the future. of the snowplow product and the way that you're approaching management of the service that you're providing for it. So
    48:10 It's definitely something that continues to be a problem as the is the the Paradox of Choice, particularly as we add new platforms and new capabilities to the overall landscape of data management.
    40:04 And if you were to start over today with all of snowplow and the infrastructure automation that you're using for it, what are some of the things that you would do differently or ways that you would change some of the evolution of either the snowplow pipeline itself or the way that you've approached the infrastructure management?
     
    Data Modeling That Evolves With Your Business Using Data Vault
    2020-02-09 (duration 1h6m)
    [transcript]
    50:34 So it's definitely easy to as we talked about this, start thinking that data vault is the solution to all of my problems in terms of being able to handle modeling and accessing and storing all of my data in a very agile fashion to get quick time to value. But what are some of the cases where the data vault approach doesn't really fit the needs of an organization or a use case or it's unnecessarily cumbersome the Because of the size and maturity of the data or the institution that's trying to implement it.
    1:05:35 Listening, don't forget to check out our other show podcast.in it at python podcast.com. To learn about the Python language, its community in the innovative ways that is being used. And visit the site at data engineering podcast com to subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried other projects in the show, then tell us about it. Email hosts at data engineering podcast.com with your story and to help others People find the show please leave a review on iTunes and tell your friends and co workers
    55:30 Yeah, boiling the ocean is never a good strategy. Yeah,
     
    The Benefits And Challenges Of Building A Data Trust
    2020-02-03 (duration 56m)
    [transcript]
    13:30 And one of the things that you mentioned there that I'm interested in digging more into is this idea of the ownership of the derivative data sets or aggregate information about the different entities contained within the data owned by the different members of the trust and some of the complications that arise in terms of where the intellectual property would lie as far as any algorithms or derivative data products that come out of the information that's available in this trust.
    52:53 Well, like I said, I think the biggest I think that so many of the technical aspects of sharing data, actually have a real robust ecosystem around them, I think the the piece that's missing is is being able to bring all of it together and make it sustainable and manageable by the by the data custodians themselves as opposed to relying on everybody, collectively going and signing up with a single vendor and having a uniform IT environment across, you know, the entire ecosystem, I think is largely impractical, especially in something like the social sector where there's a whole bunch of different actors. So to me that that the, the glue that connects data infrastructure provided by multiple vendors, and the governance structure that makes it sustainable. Is is the biggest missing piece right now, if you're a single enterprise with a single decision maker at the top who can say, use this vendor software. I think you can solve a lot of your data management problems that way. Or if you have a large enough IT staff where you can, you know, take the open source tools that are out there and glue them together and even solve the problem that way. But once you start introducing multiple stakeholders into the into the equation, think the the technical model and the governance model are not standardized, yet haven't settled on anything that that's really meeting the needs of the customers. And how about you, Greg,
    42:49 And in terms of the types of trust that you've worked with and some of the outcomes of the I'm curious what you have seen as being the most interesting or innovative or inspirational ways that You have seen the bright hive platform used as well as this broader concept of data trust being leveraged,
     
    Pay Down Technical Debt In Your Data Pipeline With Great Expectations
    2020-01-27 (duration 46m)
    [transcript]
    38:44 And as you look forward to the next steps that you have on the roadmap for great expectations and some of the overall potential that exists for the project. I'm wondering if you can just talk about what you have in store for the future and some of the ways that people can get involved and help out on that mission? Absolutely.
    13:38 And going back to the beginning of the project at the time that you were working with a to define what it is that you were trying to achieve with great expectations. I'm wondering what your experience had been as far as the available state of the art for being able to do profiling or validation and testing of the data that you were working with, and maybe any other tools or libraries that are operating in the space, either at the time or that have arrived since then? Great question.
    44:20 Well, despite all the work that we're doing, and that a lot of other people are doing and data quality, I still think the biggest gap is in bridging the the world of the data collection and processing systems. So the computing with the world of the people I mean, at the end of the day, I think it's all about what we as humans understand and as decision makers, and how we see the world that we're trying to do in this whole field. And so, I still think having people understand how models work, you know, explain ability and having machines be able to understand intent are the things that you No, just going to take a huge amount of work in a variety of different fields. And there's no course silver bullet on that. But I think that's going to be the big project area that I'm excited to get to continue working in and contributing toward.
     
    Replatforming Production Dataflows
    2020-01-20 (duration 39m)
    [transcript]
    30:21 and to your point about the time from identifying a data source to getting it active in the platform, I'm curious what the difference in the collaboration style or the overall workflow looks like between where you were to where you are now.
    38:13 For listening, don't forget to check out our other show podcast.in it at python podcast.com. To learn about the Python language its community in the innovative ways it is being used. visit the site at data engineering podcast.com Subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried other projects on the show, then tell us about it. Email hosts at data engineering podcast.com with your story, and to help other people find the show. Please leave a review on iTunes and tell your friends and co workers
    15:26 So, Sean, from your perspective, I'm curious what your experience was, and at what stage you were when she'll first came to you and wanted to trial, the ascenta platform with the workload that he had at Maven, and any aspects of the types of data that he was working with or the variety of data that posed any sort of challenge for your, for the state of the Ascend platform at the time and how that may have helped you in terms of determining your product direction?
     
    Planet Scale SQL For The New Generation Of Applications
    2020-01-13 (duration 1h1m)
    [transcript]
    32:28 And going back to the storage layer as well. One of the other interesting points is that while we focus most of this conversation on the Postgres compatibility, you also have another query interface that is at least based upon the Cassandra query language and supports a different way of modeling the data. So I'm wondering if you can talk about some of the way that you've actually implemented the storage layer itself and the way that you're able to handle these two different methods of storing and represented And querying the data and some of the challenge that arises in terms of having this split and the types of access.
    1:00:30 Listening, don't forget to check out our other show podcast.in it at python podcast.com. To learn about the Python language its community in the innovative ways it is being used. And visit the site at data engineering podcast com to subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried other projects in the show, then tell us about it. Email hosts at data engineering podcast.com with your story, and to help other people find the show. Please leave a review on iTunes and tell your friends and co workers
    57:08 And are there any other aspects of yoga by DB or your position in the overall landscape of data management or any of the other aspects of your business or your work on the platform that we didn't discuss yet that you'd like to cover? Before we close out the show?
     
    Change Data Capture For All Of Your Databases With Debezium
    2020-01-06 (duration 53m)
    [transcript]
    52:14 Listening, don't forget to check out our other show podcast.in it at python podcast.com. To learn about the Python language its community in the innovative ways it is being used. visit the site at data engineering podcasts. com Subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried other projects from the show, then tell us about it. Email hosts at data engineering podcast.com but your story and to help other people find the show. Please leave a review on iTunes and tell your friends and coworkers
    02:20 Yes, I work as a software engineer at Red Hat. I am the current project lead of the museum. So I took over from Randall if years ago. And before that, I also used to work on other data related projects at Red Hat. So I used to be the spec lead for the bino nation to the spec and I also have worked as a member of the hibernates team for a few years. And Randall, do you remember how you first got involved in the area of data management?
    17:22 Yeah, really, ever, ever since the beginning of the museum, that was an important aspect to be able to sort of take an initial snapshot, and then consistently pick up where that snapshot left off and capture all of the changes from that point. And so you know, when you're when you're sort of starting a DBZ, and connector, very often want to begin with that snapshot and then transition to capturing the changes for a consumer of one of those change event streams. They often don't really necessarily care about the difference at some point, you know, they see that a record exists with some key and then a bit later, you know, after that Changes are being captured, they would see that that record has changed, and they see a change event and then maybe the record is deleted. And so most of the time that consumers don't often care and one of the interesting things when we sort of looked at at Kaka Kaka was just introducing the ability to have compacted topics compacted topic, basically, you sort of keep the last record with the same key. And so you know, as time goes on, you can sort of throw away the earlier records that have the same key. And so if a change event, right, if you're, if you're capturing change events, and you're using, let's say, the primary key as the key for the record, you can, you know, if there are multiple events, you can sort of, you know, depending on how you set up your topic, you can set that up as compacted and you can say, Oh, I want to only want to keep you know, the last, you know, month of events, and anybody, any consumer that begins reading that stream, let's say, you know, several months from now, they start from the beginning and they're essentially getting a snapshot Right, it just happens to be the snapshot that they start with the the events, the most recent events for every record in a table, let's say. And so that's a very interesting way of, again, sort of decoupling, you know how to read, take an initial snapshot and start capturing events on the production side, and on the consumption side, still providing essentially the semantics of a initial snapshot. But that isn't necessarily rooted in the particular time that the producer actually created. Its first
     
    Building The DataDog Platform For Processing Timeseries Data At Massive Scale
    2019-12-30 (duration 45m)
    [transcript]
    18:19 Yeah. So that's a good opportunity to talk a bit more about what the team structure looks at for the people who are working closely with the data, data dog and what your particular responsibilities are and how you work within the organization, particularly given the fact that data dog has grown in size pretty significantly over the past few years. And so just sort of how you coordinate the products that you're working on across the team boundaries and across geographical boundaries.
    45:08 Listening Don't forget to check out our other show podcast.in it at Python podcast.com to learn about the Python language, its community and the innovative ways it is being used and visit the site at data engineering podcasts. com Subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried other projects in the show, then tell us about it. Email hosts EPS data engineering podcasts com with your story and to help other people find the show. Please leave a review on iTunes and tell your friends and coworkers
    20:16 Yeah, no, that's definitely good information. It's it's always interesting seeing what the team dynamics happened to be and what the breakdown is of responsibilities across different organizations, because in broad strokes, we're all working with data. We're all doing what looks to be the same thing at a high level. But as you get closer in, there are a lot of different ways that people break down the responsibilities and what the main areas of focus are. And it's interesting how the specifics of the business influence or dictate what those boundaries happened to be
     
    Building The Materialize Engine For Interactive Streaming Analytics In SQL
    2019-12-23 (duration 48m)
    [transcript]
    30:49 And then for people who are using materialize, can you talk through what's involved in getting it set up and talk through some of the life cycle of the data as it flows from the source database or from the source data stream into materialize and how you manage the overall lifecycle of the data there to ensure that you don't just expand the storage to essentially duplicate what's already in the source and you're just making sure that you have what's necessary to perform the queries that you care about.
    47:09 Thanks so much. I appreciate the time to talk and all the questions have been very thoughtful. Appreciate it.
    08:06 And in terms of how it fits into the overall life cycle and workflow of data, wondering if you can just give an overview of maybe a typical architecture as to where the data is coming from how it gets loaded into materialize and sort of where it sits on the axis of the sort of transactional workload where it's going into the database to the analytical workload where it may be in a data lake or a data warehouse or any of the other sort of surrounding ecosystem that it might tie into or feed the materialized platform.
     
    Solving Data Lineage Tracking And Data Discovery At WeWork
    2019-12-16 (duration 1h1m)
    [transcript]
    26:59 yesterday. Or Marquez itself is a modular system. So when we first designed the the original source code and also the the back end data store, we wanted to make sure that the first of all the API and also the the back end data model was platform agnostic. So you know, if, when I think of our kids, I always kind of talked about three system components. So first, we have our metal repository and the repository itself stores you know, all data set and job metadata, but it also tracks the complete history of data set changes. So you know, you can think of when someone does when a when a system or a team updates their schema, we want to track that. So we keep we keep a complete history of that, as well as when a job runs. It also updates the the data set itself. So Mark has on the back end, Christos relationships. The other component is the you know, the the REST API itself. And you know, if you if I can talk a little bit about the stack itself, you know, written in Java, we do use drop wizard pretty extensively on the project to expose the REST API, but also interact with the the backend database itself. And really, the API drives the integration. So you know, for one example, that we talked about, this is the airfoil integration that we've done. And then finally, we have the UI itself, which is used to explore data sets and discover data sets, as well as you know, explore the dependencies between jobs themselves, and allows our end users, you know, as we work to navigate, different sources that we've collected, as well as the data sets and jobs that Mark has, has catalogued.
    49:53 And what are some of the interesting or unexpected or challenging aspects of building a Maintaining the Marquez project that you have learned in the process of going through it.
    12:56 and what are some of the other integrations that you're currently using on top of Marquez and some of the ways that you're consuming the metadata and maybe some of the downstream effects of having this available that has maybe simplified or improved your capabilities for being able to identify and utilize the these data sets for your analytics.
     
    SnowflakeDB: The Data Warehouse Built For The Cloud
    2019-12-09 (duration 58m)
    [transcript]
    50:12 And what are some of the plans for the future of snowflake DB either from the technical or business side?
    54:28 Are there any other aspects of the snowflake platform or the ways that it's being used or the use cases that it enables that we didn't discuss yet that you'd like to cover before we close out the show?
    11:52 And one of the things that you mentioned in there is this variant datatype for being able to load semi structured data directly into snowflake And I know that that's one of the attributes that has led to the current shift from ETL to ELT, where you're doing the transformations within the data warehouse itself rather than up front. Because I know that traditionally there was a lot of planning and rigor that had to go into loading the data into the data warehouse to make it accessible for other people. And so I'm wondering how the performance characteristics and flexibility of snowflake and the availability of this variant datatype and some of the schematic flexibility plays into the overall data modeling requirements for using snowflake and some of the shifts towards this ELT
     
    Organizing And Empowering Data Engineers At Citadel
    2019-12-03 (duration 45m)
    [transcript]
    01:39 Yeah, no problem. Soon as Michael Watson, I'm the director of the data engineering organization here at Citadel and head of the enterprise data team. been here for about five years and longtime listener of the show. So really excited.
    10:27 and how has the overall nature of the responsibilities and the work that the different data engineering teams are doing evolved over the past few years at Citadel, as the tooling and capabilities have improved for being able to manage this data. And as more sophisticated analysis techniques have become more mainstream in terms of machine learning and deep learning, and some of the different requirements of data volumes of data quality has evolved and increased As a result,
    34:54 And as you continue to evolve the capabilities requirements of the data organization at Citadel what are some of the challenges, whether technical or business oriented or team oriented that you are facing and that you're interested in tackling in the coming weeks and months?
     
    Building A Real Time Event Data Warehouse For Sentry
    2019-11-26 (duration 1h1m)
    [transcript]
    08:33 And were you starting to run into any user facing problems or was most of the pain just in terms of the internal operations and trying to deal with managing the data and the cognitive load on the engineering side?
    46:55 it's the compression it's
    1:00:28 for listening, don't forget to check out our other show podcast.in it at python podcast.com. To learn about the Python language its community in the innovative ways that is being used. visit the site at data engineering podcast. com Subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried other projects on the show, then tell us about it. Email hosts at data engineering podcast.com with your story and to help other people find the show. Please leave a review on iTunes and tell your friends and coworkers
     
    Escaping Analysis Paralysis For Your Data Platform With Data Virtualization
    2019-11-18 (duration 55m)
    [transcript]
    46:16 And when is the scale platform the wrong choice?
    48:48 Are there any other aspects of the scale platform and the work that you're doing there or the ideas around data virtualization or data engineering automation that we didn't discuss yet that you'd like to cover before? close out the show.
    30:27 I went off and I think I've lost track of what the core The question was Tobias. I'm really sorry.
     
    Designing For Data Protection
    2019-11-11 (duration 51m)
    [transcript]
    46:55 I think probably that is the main ones I would personally call it out of the moment.
    33:33 Yeah, yeah. Good question. Interestingly in the so the interview we got i can i think it's six or seven x data subject we call today tend to then you know consumers known as data subjects, the right to ratio the right forgotten the right subject the right to get access your data the right of corrections right to this right to the mall, but it's either six or seven. Another one more though the right we forgot to grab most of the head. lines, the one that's mainly exercised is the data subject access, right? The right to get a copy the data held out here, okay, now exists in the GDPR, access and the CPA and you get the same issue, right, multiple systems and so on. Now, talking about the subject access, right to begin with, it's not a, you know, you have to go to the end and ends of the earth to produce everything, but you've got to make a reasonable effort. Clearly, if you've got a coherent system, with everything in place, it's easy to do if you're, if you're struggling, you know, six or seven legacy systems and it gets much more complicated. A good reason to get rid of data. You don't need to be frank because you don't have to report on it. In terms of the right the right the right forgotten the right of deletion, it's not an unfettered, right. It's not on the qualified. So if I got a contract with you in the sort of customer, you can't just have your right. You can't just delete my data. Well, I can't do that with a contract with you. Even though the contracts over you know, there's reasons you can refuse for example, you're allowed to take out all the data if you think that you might need it for legal complication of you know, Legal Defense some point down the line, so it's not
    38:40 So yeah, there. These are the million dollar questions. Really, I think we're casing into the head definitely. Once we're starting to go down the stream from the data collection, and we're using the scientists in the analytics and the analysts are starting to run searches across the data access Come, it still comes back to the organization's responsibility and requirement to provide the scientists and analyst with a decent clean set of properly obtained lawful data. So, if then data scientists or analysts then want to access data within that data set that could be protected. There should be appropriate controls or tags, or logging or audits that the scientists and the analysts can see and be aware of when they then come to go and do the projects that they've been assigned to do. And it's also possible as a way of embedding that, the management of that at the beginning of a project if scientists or analysts or even project managers running a project that involves a scientist run on panelists, they do a privacy impact assessment on what the project is to achieve. So if the outcomes from the intended outcomes from the project are those that might result in the creation or profiled data set upon which decisions will be made, then really at the beginning of the project, they should do some sort of impact assessment of what data they're going to use, are they using in a lawful way? And what safeguards are putting in place for the results that are generated from that particular project? So yeah,
     
    Automating Your Production Dataflows On Spark
    2019-11-04 (duration 48m)
    [transcript]
    10:07 out of the muck, and my understanding is that the foundational layer for the platform that you've built is using Spark. So I'm wondering if you can talk a bit about your criteria for selecting the execution substrate for the platform, and some of the features of spark that lend itself well to the project that you're trying to build?
    15:34 and what are some of the ways that the implementation details and the overall system architecture have evolved since you first began working on it and some of the assumptions that you had going into the project that have been challenged or updated as you started to get deeper into the problem?
    18:14 company. And in terms of the interface that an end user of the Ascend platform would be interacting with. How does that compare to the spark API? And what are some of the additional features and functionality that you've layered on or the changes in terms of the programming model that's available?
     
    Build Maintainable And Testable Data Applications With Dagster
    2019-10-28 (duration 1h7m)
    [transcript]
    20:36 And that is the kind of the primary focus for our programming model.
    20:39 And since Dexter itself is focused more on the programming model and isn't vertically integrated, as I mentioned before, as opposed to tools such as air flow that people might be familiar with. I'm curious how that changes the overall workflow and interaction of the end users of the system. And what your reasoning is for decoupling the programming layer from the actual execution context. Yeah. So,
    1:07:02 Thanks for listening. Don't forget to check out our other show podcast.in it at python podcast.com. To learn about the Python language its community in the innovative ways it is being used. Come visit the site at data engineering podcast.com Subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried other projects in the show, then tell us about it. Email hosts EPS data engineering podcast com with your story and to help other people find the show. Please leave a review on iTunes and tell your friends and coworkers
     
    Data Orchestration For Hybrid Cloud Analytics
    2019-10-22 (duration 42m)
    [transcript]
    24:01 Yeah. In terms of migration, we've seen a couple of different approaches, if you have if the mindset is that the cloud initiatives in the enterprise are the, you know, Paramount initiative in the sense of like, there's a immediate urgency to move to the cloud, really the, the, the only approach that that works is, you know, moving, migrating the entire data set to the cloud and then running, running workloads in the cloud, right. But it's a it is a non trivial, it's a non trivial kind of approach. Not everyone has the ability to run both on prem and the cloud. At the same time, it obviously doubles costs. But if that is an option, then we have you know, there's there are companies that just go about that that way, most of the companies most of the enterprises, the way they the way they think through it is a workload by workload, right? And they start off With a low risk workloads, and it, these are workloads that don't need that that are, that might be ephemeral. So for example, you just run the workload and you know, it's done and it's gone. They start off with the ephemeral workloads, then they start off with the the more scheduled workloads that might be adding additional capacity additional overhead at a certain time of the month, for example, at the end of the month, end of the quarter, there, there might be a lot more reporting, jobs that run and so those can be methodically moved over to the to the cloud, their data sets can be moved over to the cloud. Now, if there is a huge overlap in the data sets across all these workloads, then it becomes harder, and that's where this the the zero copy burst or the the bursting to the cloud would make would make more sense. And so depends on you know, these are the different approaches. It depends on the the mindset of the company. the urgency of moving to the cloud, the goal of it as well as the appetite, right? In some ways, there have been some breaches and so on, which, which has caused people to think about, okay, is this the right thing to do, especially with their enterprise data. And so what we hear is some data will continue to live on prem, and it will never be moved to the cloud, a lot of data will move to the cloud, but there will be some kind of IP almost IP category of data, which is so precious that that it will continue to remain on prem. And then the other part of the question you asked is metrics, right? So metrics in terms of how do you how do you gauge success or progress to this process, the other highest level, there is, you know, total data set the size of the total data set, but then you kind of come down to the next level of granularity is the number of tables you're managing and, and and then you can look at that on a business. This unit basis because invariably, this data is played by, you know, finance and marketing and product. And it's everyone, you know, all this data is in a single data lake or in a few different data lakes, right. And so you can go organization, by organization and within the organization, it is the, it's a number of tables that have been migrated or not. And then which reports are associated with those tables, right? So that you, you never are in a situation where if a query, if a report comes in, if a query comes in that the data set is not available, there might be a period of time where you have to have both available and then you're playing catch up at one point, you say, no more, you know, no more new, new data gets generated, and then you start adding the new data from the in the cloud, and then that workload permanently gets run in the cloud. So those are some of the things that we you know, we here as we talk to users, and try to identify with workloads that they would like to burst that would be easiest to begin with spark ephemeral jobs for, particularly for machine learning, modeling, regression analysis, Monte Carlo analysis, those kinds are relatively easy. And then you kind of move on from that point.
    42:04 listening. Don't forget to check out our other show podcast.in it at Python podcast.com to learn about the Python language its community in the innovative ways that is being used. And visit the site at data engineering podcast. com to subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried other projects in the show, then tell us about it. Email hosts at data engineering podcast.com with your story, and to help other people find the show. Please leave a review on iTunes and tell your friends and co workers
    38:28 So for somebody who wants to learn more about the overall space of data orchestration and the benefits that it can provide, and some of the overall industry trends that it's driving that are driving it, what are some of the resources that you would recommend?
     
    Keeping Your Data Warehouse In Order With DataForm
    2019-10-15 (duration 47m)
    [transcript]
    33:53 And then, as you've mentioned, there is the web UI and there's also the command line interface that Has the core functionality and wondering if you can talk a bit about the differences in capabilities and when somebody might want to reach for the web UI versus the command line.
    36:05 And what are the cases where data form is the wrong choice for a given project?
    32:44 And in terms of the sequel, dialect support, do you have any validation in terms of the compilation step to verify that a particular statement is going to comply with the dialect of the engine that it's going to be running Against?
     
    Open Source Object Storage For All Of Your Data
    2019-09-23 (duration 1h8m)
    [transcript]
    43:01 for somebody who is deploying Min IO, what are some of the operational characteristics and server capabilities that they should be considering? And in the clustered context? How is the load balancing handled? Is it something that you would put a service in front of the cluster? Or is that something that the nodes themselves handle as far as routing the requests to the different nodes within that cluster for being able to ensure a proper distribution of the data.
    1:05:06 Yeah, I think the like to summarize that it's the tooling is this is the search part, the access management and a in terms of policies access management. In the past, because you had many different storage systems, like trying to do unified data governance on top of a variety of like San NAS vendors and database vendors was was hard. This is where the data lake got all the bad rap, right. And the new world. The good part is all of the data is getting consolidated to object storage. And there is only one storage system at the heart of the data infrastructure. And everything else is simply stateless containers and BMS around object storage. And they are all accessing through s3 API. If this is the case, finally, data management is something practical, right. But then the key here is the old school data management model does not work here anymore. It has to be fundamentally rethought in the form of a search and access platform. I haven't seen a good product yet in the market. But certainly I keep hearing new startups wanting to go after this
    30:53 In terms of the actual implementation of Min IO. I'm wondering if you can talk through the overall system architecture and some of the ways that it has evolved. And then also some of the additional services that you've built alongside it, because I know that you have replicated some of the functionality around the key management store and I am our identity and access management and just some of the overall sort of tertiary projects that have come to be necessary as you evolve the overall capabilities and reach of Mineo.
     
    Navigating Boundless Data Streams With The Swim Kernel
    2019-09-18 (duration 57m)
    [transcript]
    24:19 the,
    38:39 let the app application data or let the data build the app, or most of the app can bonus in response
    42:27 require that object, which are the digital twins have the ability to inspect
     
    Building A Reliable And Performant Router For Observability Data
    2019-09-10 (duration 55m)
    [transcript]
    39:19 that they need. Another aspect of the operational characteristics of the system are being able to have visibility into particularly at the aggregate or level, what the current status is of the buffering or any errors that are cropping up, and just the overall system capacity. And I'm curious if there's any current capability for that, or what the future plans are along those lines.
    17:09 Yeah, the ability to map together different components of the overall flow is definitely useful. And I've been using fluid D for a while, which has some measure of that capability. But it's also somewhat constrained in that the logic of the actual pipeline flow is dependent on the order of specification and the configuration document, which is sometimes a bit difficult to understand exactly how to structure the document to make sure that everything is functioning as properly. And there are some mechanisms for being able to route things slightly out of band with particular syntax, but just managing it has gotten to be somewhat complex. So when I was looking through the documentation for vector, I appreciated the option of being able to simply say that the input to one of the steps is linked to to the ID of one of the previous steps so that you're not necessarily constrained by order of definition, and that you can instead just use the ID references to ensure that the flows are Yeah, that
    13:36 In terms of the actual implementation of the project, you've already mentioned in passing that it was written in rust. And I'm wondering if you can dig into the overall system architecture and implementation of the project and some of the ways that it has evolved since you first began working on it, like you said, rust is,
     
    Building A Community For Data Professionals at Data Council
    2019-09-02 (duration 52m)
    [transcript]
    47:06 Looking forward, what are some of the goals and plans that you have for the future of the data council business and the overall community and events?
    26:34 So one of the other components to managing communities is the idea of sustainability and the overall approach to governance and management. And I'm wondering both for the larger community aspect, as well as for the conferences and meetup events, how you approach sustainability to ensure that there is some longevity and continuity in the overall growth and interaction with the community?
    44:39 in terms of the overall industry and some of the trends that you're observing and your interaction with engineers and with vetting the different conference presentations and meetup topics, what are some of the trends that you're most excited by? And what are some of the ones that you are either concerned by or some potential issues that you see coming down the road in terms of the overall direction that we are heading as far as challenges that might be lurking around the corner?
     
    Building Tools And Platforms For Data Analytics
    2019-08-26 (duration 48m)
    [transcript]
    14:00 I think that that the thing that is the responsibility of the tool builders is more just having the empathy of
    18:37 that's the job. And
    06:01 the data scientists can get it to work the way I like to work.
     
    A High Performance Platform For The Full Big Data Lifecycle
    2019-08-19 (duration 1h13m)
    [transcript]
    28:10 while the data has been sprayed. And depending on the length of the data,
    30:17 seem okay. The
    1:04:26 But in the great
     
    Digging Into Data Replication At Fivetran
    2019-08-12 (duration 44m)
    [transcript]
    28:12 On the other side of the equation, where you're loading data into the different target data warehouses, I'm wondering what your strategy is, as far as being able to make the most effective use of the feature sets that are present, or do you just target the lowest common denominator of equal representation for being able to load data in and then leave the complicated aspects of it to the end user for doing the transformations and analyses.
    20:00 One of the other issues that comes up with normalization. And particularly for the source database systems that you're talking about is the idea of schema drift, when new fields are added or removed, or a data types change, or the overall sort of the sort of default data types change. And we're wondering how you manage schema drift overall, in the data warehouse systems that you're loading into well, preventing data loss, particularly in the cases where a column might be dropped, or the data type changed.
    42:28 I don't think so I think the the thing that people tend to not realize because they tend to just not talk about it as much is that the real difficulty in this space is all of that incidental complexity of all the data sources. The you know, Kafka is not going to solve this problem for you. spark is not going to solve this problem for you. There is no fancy technical solution. Most of the difficulty of the data centralization problem is just in understanding and working around all of the incidental complexity of all these data sources.
     
    Solving Data Discovery At Lyft
    2019-08-05 (duration 51m)
    [transcript]
    47:40 Are there any other aspects of the Amundson project itself or the ways that it's being used at Lyft, and in the open source community, or the engineering work that has gone into it that we didn't discuss yet that you'd like to cover before we close out the show?
    08:28 Yeah, fantastic question, right. And I think we are on this scale, right? I consider curation. We're a bi engineer already engineers curating every single data model that's coming out, and making sure it's high quality, and it remains high quality all the time to place off complete chaos, where you have so many so much data, it's growing in a more democratic fashion. Not everyone is aware of what all the places to go to war, and so on, so forth. And I would say most companies are somewhere in that spectrum, right? Hopefully, you're in out of the chaos side. And I don't think the growing company can be on the on the curation side. So the examples that you chose Tobias, for example, about the ranking, what Amundson does, is that it uses two axes for ranking one is popularity, and the other one is relevance. So when you type eta, it matches eta with a bunch of different fields within the table, the name of the table, the description of the table, the column names and the column descriptions, and the tags to surface, all the things that match with the term eta. But then it also ranks them based on popularity, popularity being the amount of sequel scoring that happens on that table. So tables, I get query more show up higher in the search results than table to get scored less. And we also attach different ways to automatic queries, like a new TL job versus an ad hoc query, like human recording the table. And that's how we started to build this, this intelligence around. What is a good proxy for trust? In this example, we just quoted it as the popularity or the amount of sequel queries that get written against the table as a as a measure of trust where
    51:13 Thank you both very much for taking the time today to join me and discuss the work that you've put into the Amundson project. It's definitely an interesting problem space, and one that is absolutely necessary for the continued success of data platforms and data teams as the overall complexity of our systems continues to grow and evolve. So I appreciate the work that you've both put into that and I hope you enjoy the rest of your day.
     
    Simplifying Data Integration Through Eventual Connectivity
    2019-07-29 (duration 53m)
    [transcript]
    30:07 one of the things that you talked about in there is the fact that there's this flexible data model. And so I'm wondering what type of upfront modeling is necessary in order to be able to make this approach viable? I know, you talked about the idea of the entity codes and the aliases. But for somebody who is taking a source data system and trying to load it into this graph database in order to be able to take advantage of this eventual connectivity pattern, what is the actual process of being able to load that information in and assign the appropriate attributes to the different records and do the different attributes in the record? And then also, I'm wondering if there are any limitations in terms of what the source format looks like, as far as the serialization format or the types of data that this approach is viable? For?
    04:31 And so to frame the discussion a bit, I'm wondering if you can just start by discussing some of the challenges and shortcomings that you have seen in the existing practices of ET? Oh,
    50:22 And one follow on from that, too, I think is the idea of migrating from an existing ETL workflow and into this eventual connectivity space. And it seems that the logical step would be to just replace your current target system with the graph database and adding in the mapping for the entity IDs and the aliases. And then you're sort of at least partway on your way to being able to take advantage of this and then just adding a new ATL or workflow at the other end to pull out of the connected data into what you original target systems were. Yeah, exactly.
     
    Straining Your Data Lake Through A Data Mesh
    2019-07-23 (duration 1h4m)
    [transcript]
    26:30 welcome, the data enthusiasts
    17:18 Yeah, it's definitely interesting seeing the parallels between the monolithic approach of the data lake and some of the software architectural patterns that have been evolving of people trying to move away from the big ball of mud, because of the fact that it's harder to integrate and maintain, and that you have the similar paradigm and the data lake where it's hard to integrate all the different data sources in the one system, but also between other systems that might want to use it downstream of the data lake, because it's hard to be able to separate out the different models or version, the data sets separately or treat them separate, because they're all located in this one area.
    42:39 Yeah, one of the issues with data lakes is up the face value, the the benefit that they provide is that all of the data is co located. So you reduce latencies when you're trying to join across multiple data sets, but then you potentially lose some of the domain knowledge of the source teams or source systems that were generating the information in the first place. And so now we're moving in the other direction of trying to bring a lot of the domain expertise to the data and providing that in a product form. But as part of that, we then create other issues in terms of discover ability of all the different data sets consistency across different schemas for being able to join across them, where you, if you leave everybody up to their own devices, they might have different schema formats, and then you're back in the area of trying to generate Master Data Management Systems and spend a lot of energy and time trying to be able to coerce all of these systems to have common view of the information that's core to your overall business domain. And so when I was reading through the post, initially, I was starting to think that, you know, we're trading off one set of problems for another. But I think that by having the underlying strata of the data platform team, and all of the actual data itself is being managed somewhat centrally from the platform perspective, but separately from the domain and expertise perspective, I think we're starting to reach a hybrid where we're optimizing for all the right things, and not necessarily having too many trade offs in that space. So I'm curious what your perspective is, in terms of those areas of things like data discovery, schema consistency, where the responsibilities lie between the different teams and or and organizationally for ensuring that you don't end up optimizing too far in one direction or the other?
     
    Maintaining Your Data Lake At Scale With Spark
    2019-06-17 (duration 50m)
    [transcript]
    47:01 Tobias Macey: And so looking forward for the project, I know that you said that in the near term, you have some work to extricate some of the code from the internal repositories and bring it into the open source repo as far as some of these dl capabilities. But looking to the medium and long term, what do you have planned for the future of delta like,
    39:40 Tobias Macey: And so going back to the Delta, like project itself, I think the last piece, or at least the last major piece that we haven't talked about yet, is the capability of creating indexes across the data that you're storing and your data lake. And so I'm wondering how that manifests and any issues that you have encountered as far as scalability and maintaining consistency of those indexes as the volumes and varieties of data that you're working with grow?
    03:02 Michael Armbrust: Yeah. Hi, my name is Michael Armbrust. I'm an engineer at data bricks, and I'm also the original creator of Apache Spark sequel, structured streaming. And now I'm the tech lead for the Delta Lake project.
     
    Managing The Machine Learning Lifecycle
    2019-06-10 (duration 1h2m)
    [transcript]
    57:16 Unknown: the
    54:21 to provide the
    50:56 with the data set.
     
    Evolving An ETL Pipeline For Better Productivity
    2019-06-04 (duration 1h2m)
    [transcript]
    48:43 Raghotham Murthy: Yeah. So just to kind of give an example yet another example of kind of how Reno's is helping us move forward on this. So one of the things that we have done right now is this whole compile step. So earlier, people would be able to just kind of create this metal as us and we would kind of automatically infer the dependency isn't that agenda, the pipelines. But now, with these kind of deep yellow files that are all kind of in one kind of repository, when you're trying to update one of those metals use, you should be able to kind of get a compile step that will then tell you if there's anything downstream that might actually get affected, right, because we know what the data dependencies art. And again, with greenhouses kind of leading the way in terms of providing the right kind of use cases, we've been able to kind of get started on the compile step. And then the idea would be that would actually provide like a CI CD pipeline where you change that one metal is you somewhere in the middle of the day. And then you should be able to not only push it to production, but be able to kind of run it in like a test mode, from that node all the way downstream, so that you know what the difference is going to be like, after the test has been. So after the change actually has been applied. So these are actually things that we are actively working on and and again, with green, helping lead the way in terms of finding the right kinds of use cases.
    52:56 Tobias Macey: And are there any other aspects of the work that you're doing at greenhouse saw the work that you're doing at data coral, or the sort of interaction between the two companies that we didn't discuss yet they'd like to cover before we close out the show?
    45:12 Tobias Macey: And just wondering if you can also quickly talk through what the current workflow looks like for building and maintaining the data flows that you're deploying on to the data coral platform, and just what the interaction pattern looks like, and how you're managing and organizing the code that you're deploying for managing those data flows and how you ensure sort of discovery ability or visibility of what the flows are.
     
    Data Lineage For Your Pipelines
    2019-05-27 (duration 49m)
    [transcript]
    34:58 Tobias Macey: And so in terms of the problem on so I know that because your version in the containers that are executing as part of the pipeline, that is an added piece of information that goes into it, as far as this is the data that was there when we started it, this is the code that actually executed and then this was the output. But for external systems, do you have any means of tracking the actual operations that were performed to enrich the metadata associated with the provenance?
    28:18 Tobias Macey: going back to the idea of data provenance and data lineage, you've mentioned that some of the way that it's tracked is through these version and capabilities of the file system. But I'm wondering if you can just dig deeper into the underlying way that it's represented as far as tracking it both from source to delivery, and how that actually is exposed when you're trying to trace back from the end result all the way back to where the data came from? And what's happened to it along the way.
    35:23 Joe Doliner: Yeah, so those can basically use the same system we we use, which is that so we track the information about all of the code that ran and you know, the Docker container and everything like that, but we actually just use that by piggybacking on PFS is provenance system, because we just add that as a commit. So every job has what we call a spec commit that specifies how the job is supposed to be run. And that includes the the code and the the Docker container and everything like that. And so outside systems are basically just expected to, you know, whatever, whatever you can serialized this information as just put it in a commit. And then you know, that's just in essence, it considered as an input into the pipeline, like it's really not in terms of the provenance tracking in the storage system, any different than any other input. It's just that this one happens to define the code that's running in the computation. And
     
    Build Your Data Analytics Like An Engineer With DBT
    2019-05-20 (duration 56m)
    [transcript]
    54:13 Drew Bannon: Yeah, that's a really good question. I think that the last few years have brought a sea change for how data is ingested into databases. So these off the shelf ATL tools are great. The tooling around actually like building ATL if you need to build custom stuff yourself is great compared to it was a few years ago, I think that you know, the actual warehouse technology has quantum leaps in the past to five years, going from just say, maybe a little bit longer than that at this point. But like a pre redshift world to where we are now where it's like was redshift, but also big queries, amazing. And so flicks phenomenal as warehouse, those are all great. And I think the the sort of last part of it is on the big stack, and like the big part of the stack. And so I think that these individual tools all work really well. But they're just say the, you know, the lookers in the mode analytics and the periscope, the world. They're all like great tools for what they do. But the actual interoperability with the rest of the stack is I think, kind of, like not all the way there. And the the place that I think the highest, the most amount of value can be added is an understanding, like the connection between the queries that are running in your BI tool. And just, you know, maybe this is a dbt centric point of view, but which dbt models, they're querying, for instance, and actually like uniting the whole stack from source data through to, you know, emailing your warehouse transformation analysis, if you could have that unified picture of lineage for like an actual chart, understand the provenance all the way back to source data. That's like a really powerful thing. That to date has been elusive. But I think like we're getting closer and closer to it.
    44:39 build their dbt models within the interface.
    50:43 And I'm in the position of like, Product Manager, maintainer has been so fascinating, and I love it so much. And certainly, it's one of the more one of the areas where I've grown the most since since starting working on dbt three years ago,
     
    Unpacking Fauna: A Global Scale Cloud Native Database
    2019-04-22 (duration 53m)
    [transcript]
    17:38 Tobias Macey: And as far as the underlying storage layer, and the data modeling that Fanta support, so I'm wondering if you can talk through how that's implemented, and specifically for the multi model capacity, how the query layer is designed to be able to allow for those different query patterns on the same underlying data.
    13:43 Evan Weaver: Yeah, so what Calvin does is invert the synchronization model. Instead of using clocks to figure out when transactions occurred on the data replicas. It sends the transactions themselves to a log which then essentially defines the order of time. These transactions in the shared log are then a synchronously replicated out to the individual replica nodes very similar to a traditional NoSQL system. And that gives you a ton of advantages. So at the front end, sort of in the right path, you have a raft cluster, which is shard and partition entirely available spans nodes, that's accepting these deterministic Lee submitted transaction effects or intermediate representations, what have you that thing has no single point of failure, its global tamale data center, any node can commit to it within the same medium latency, regardless of how complex the transaction is. Then on the read side, you can have as many data centers as you please tailing off this log in lockstep applying the transaction effects locally, to their their local copy of the data means that on the read side, you get scale out experience, which doesn't require any coordination. So we can do a snapshot reads from any data center with single single millisecond latency. Whereas on the right side, you know, the, the latency for a commit takes about one majority round trip throughout the log nodes wherever they're configured to be in the data center. So you know, 100 to 200 milliseconds in a typical multi continent cluster. That's basically the best you can do in terms of balancing of like, maximizing availability without ever giving up the benefits of transaction ality.
    47:18 Tobias Macey: And looking forward, what do you have in store for the future of Fanta, both from the business and technical side,
     
    Building An Enterprise Data Fabric At CluedIn
    2019-03-25 (duration 57m)
    [transcript]
    54:44 And what are your plans for the future of clued in both from the technical and business perspective,
    34:49 And as far as being able to manage the data lineage or data provenance, and then be able to expose it to the customer. So that if there are any issues with the clean or reconciliation, or if there's just some incorrect data and the record that they're then able to trace it back to the source system and maybe do some fixes for how the data is being transferred to you or fixes in the source system itself, or how we're they're capturing the data originally, just wondering what you're using to manage that and some of the challenges and benefits that you found in the process of building that system?
    12:34 and your point that you're making about regulation and compliance requirements that these companies are dealing with. I'm wondering, in particular, how you manage that at the architectural and system level in terms of managing data privacy and security as it transits your system, and what the sort of deployment environment looks like whether you're running a hosted system that people send the data through, or if you're co located with the data to reduce some of the latency effects of having to cross network boundaries, and just some of the overall ways that you have built your company and the ways that the data flows through.
     
    Deep Learning For Data Engineers
    2019-02-25 (duration 42m)
    [transcript]
    16:05 And in terms of the actual responsibilities of the data engineer for the data as it's being delivered to these algorithms, particularly as it compares to machine learning, where you might need to do up front feature extraction and feature identification, to be able to get the most value out of the algorithm. My understanding is that with deep learning, you're more likely to just provide coarse grained labeling of the information, and then rely on the deep learning neural networks to extract the useful features. So I'm wondering if you can talk a bit about how the responsibilities of the data engineer shift was, you're going from machine learning into deep learning, particularly from the standpoint of feature extraction and labeling.
    08:43 and particularly from the perspective of a data engineer who's working on building out the infrastructure and the data pipelines that are necessary for feeding into these different machine learning algorithms or deep learning projects, what is involved in building out that set of infrastructure and requirements to support a project that is going to be using deep learning, particularly as it compares to something that would be using a more traditional machine learning approach that requires more of the feature engineering up front, as opposed to just being in the label the data sets for those deep learning algorithms?
    40:48 right. And for anybody who wants to follow along with you, or get in touch or see the work that you've been doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap and the tooling or technology that's available for data management today.
     
    The Alluxio Distributed Storage System
    2019-02-19 (duration 59m)
    [transcript]
    27:38 Tobias Macey: and in terms of the actual code itself, I'm wondering if you can discuss some of the particular challenges that you faced in terms of building and maintaining and growing the project and discuss some of the evolution that it has undergone in terms of the overall architecture and design of the system?
    08:20 Tobias Macey: And a couple of the things that you mentioned in there preempted some of the questions I have, particularly in terms of the persistence of the data in memory, as far as what happens when you have a memory failure and the machine or the instance gets rebooted, and how that data distribution gets managed, at least in terms of the long term persistence of the data, where he rely on the underlying storage systems to provide those guarantees. I'll probably have a few more questions later on, about how you manage distribution among the Alexia layer once the data has been retrieved. But before we get into that, I want to talk a bit more about some of the use cases that Alexia and I
    10:00 different data centers and the have cross data data center traffic to load data because we want to do a joint and the one table is in one data center and other parts of the computation. Other tables are in the local data center. So in this case, having a luxury in place will help greatly to reduce the computation time because you can either pre load the data the table from remote data center to the log co caching layer, or Alex, you have the intelligence built in to bring the data on demand and next time after the code reads. Next time. If you read that same data, again, this will be cached locally. So we do see a great performance gain in cases like this. So we have a published use cases with by do the search giants in China they see performance benefits by 30 x in this cases. See the other case interestingly, I see more is really the cloud as
     
    Apache Zookeeper As A Building Block For Distributed Systems with Patrick Hunt
    2018-12-03 (duration 54m)
    [transcript]
    07:24 And also the, the fallacies that programmers believe about networking?
    24:29 so in terms of the design of the system, as far as where it falls on the axes of the CAP theorem, it seems that you're favoring availability and partition tolerance at the expense of some amount of consistency. Because in some cases, if two different parts of the system get a different response, it's not necessarily the end of the world, as long as everything can keep running.
    43:16 and given the length of time the zookeeper has been around and the massive changes and developments that have occurred in that time span. I'm wondering how you have seen the needs of distributed systems engineers, and the system's themselves changed since you first began working with zookeeper.
     
    A Primer On Enterprise Data Curation with Todd Walter
    2018-09-24 (duration 49m)
    [transcript]
    35:37 Exactly. And and as you start start from the beginning, building out a governance process that that works with the users to understand what level of curation is actually required. And do the absolute minimum necessary curation to meet the business requirements, as I call it many Minimum Viable curation, stealing the term from the Agile world. And the idea is that if you have a good conversation between the people who are doing curation and the people who are using the data, and and you have a constructive conversation there continuously, then you can, then you can spend the right amount of time and dollars on, on doing the curation and be very selective, you might have a data set with 1000 attributes in it. But the the users who are producing the Business Report, say, well, we only care about these five, then you don't need to curate the other 995, don't waste your time on it, curate out the five that you care about, that they care about. And leave the rest for another day, when another application comes along in it. And those need to be curated to support another application or, or new business us that governance team is a key thing. You know, nobody gets to start from scratch, it would be nice to start from a green, you know, blank sheet of paper Greenfield, nobody gets to start from scratch. But if you did, you know, if I got to start from scratch, I would I would start the governance process very early, it might be very light to start with. But I would be growing it as the datasets grew. And as the as the as the usage grew. And as more and more people using more and more of the data more widely in the in the organization, and I build in the metadata from day one, you got to start capturing the metadata is very hard to go back and capture the metadata and the lineage after the fact. So it build in the metadata capture and, and lineage capture as automated as possible, right from the very beginning to to make sure that every everything, track and trace on everything.
    30:12 But when you need to get the data out to 10,000 users in 50 organizations, schema read no longer makes any sense at all. It is a huge resource utilization, because you're doing it over and over and over again, for every every use of the data. It it introduces all sorts of opportunities for each person to curate the data, or each application to curate the data in a different way and does get different answers. It introduces a whole bunch of, of problems that we're all the reasons why we did ATL in the data warehouse world in the in the first place. And so the more the data is used across the organization, the more production or data as a product the data becomes, the more curated it needs to be, the more it needs to be modeled. And the work done the curation work done once and then the curated data used many times by the by the people don't stream.
    31:20 And going back to your metaphor of the museum with the data lake versus the data warehouse, the data warehouse ends up being the display room where all of the exhibits are put on display for everybody to be able to access and they're easy to consume and understand. And then the data lake is the basement of the museum where all of the raw unprocessed resources are for people to be able to do their research and analyses and prepare them for moving up to the display room.
     
    Data Serialization Formats with Doug Cutting and Julien Le Dem
    2017-11-22 (duration 51m)
    [transcript]
    46:34 well, for anybody who wants to follow the work that both of you are up to and the state of the art with the your respective serialization format. So I'll have you add your preferred contact information to the show notes. And then just for one last question, to give people things to think about, if you can each just share the one thing that is at the top of your mind in the data industry that you're most interested excited for? That? How about you go first?
    51:32 Kind of the boring backwater of big data
    32:59 One of the things that I'm curious about is particularly given the level of maturity for both of these formats, and some of the others that are available, what the current evolutionary aspects of the formats are what's involved in continuing to maintain them, and if there any features that you are adding or considering adding, and then also the challenges that are that have been associated with building and maintaining those formats.
  •  
     
  •  
    Build Your Own Domain Specific Language in Python With textX
    2020-06-30 (duration 54m)
    [transcript]
    36:05 do have their own problems for people who are using tech stacks and building their own languages. What are some of the common challenges that they run into either in terms of being able to overcome those challenges of the limitations of the peg grammar, or just the overall process of building the DSL and making it available to their end users for being able to do the work that the DSL is intended for?
    12:34 And for people who are building these dsls How does the actual definition of the grammar and hooking into the behavior as you mentioned with the direct parsing tools, you have to be much more manual and explicit about it?
    15:08 for people who are defining the dsls. What are some of the challenges that they face in just constructing the syntax of their target language? And what are some of the types of inspiration that they might look to for determining how it's going to look and the user experience of the people who are going to be using that language and the DSL parser and logic that are going to be generated and built with text x?
     
    Adding Observability To Your Python Applications With OpenTelemetry
    2020-06-23 (duration 53m)
    [transcript]
    23:47 to get back to the questions specifically around Python. So for the the open telemetry Python project, or we're really focusing on is just wrapping up the work around implementing the spec around metrics and today the The implementation for tracing as already is already in beta. And we're already, I think, pretty close to being done. Yeah. So between the tracing permutation and the metrics implementation, that's kind of the, the interfaces that you'll want to use from from Python.
    15:12 and stepping back up to the level of the more broad open telemetry project. And its mission it because of the fact that it is focused on support across a number of different language communities and runtimes. And specifics of libraries and frameworks. I'm wondering how that benefits the overall broader software community and just some of the challenges that you face in avoiding the trap of limiting the entire capability of the ecosystem to the lowest common denominator that exists across those different runtimes.
    42:28 And then in terms of your experience of working with the open telemetry community and working on the specification and some of the SDK implementations, what are some of most interesting or unexpected or challenging lessons that you've learned in the process?
     
    Build A Personal Knowledge Store With Topic Modeling In Contextualize
    2020-06-15 (duration 58m)
    [transcript]
    37:32 given the fact that this is an open source application. And as you mentioned, one of those pieces of feedback has directed some of the user experience but how what are some of the other ways that it's exposure to the public and the fact that you're doing this development in the open, how has that influenced your overall approach to the design and implementation of the software and the user experience?
    15:28 other question? The other question was just what are some of the things that you found lacking in the other tools that are available for knowledge management and topic mapping that led you down the path of actually creating your own system and some of the specifics of contextualize that might lead someone to choose that over the other options?
    43:42 in terms of the experiences that you've had building, contextualize? And some of the other topic modeling platforms that you've worked on? What have you found to be some of the most interesting or complex or complicated aspects of that and some of the most interesting or unexpected lessons that you've learned in the process?
     
    Open Source Product Analytics With PostHog
    2020-06-08 (duration 49m)
    [transcript]
    06:35 That's why they're installing us. And in the description of post og. And as you mentioned, at the beginning, it's described as being a product focused analytics platform, and you draw the contrast to session based analytics in the readme of the project. So I'm wondering what the meaningful differences are between the two in terms of the types of data that are collected and the use cases that they benefit from.
    34:53 And once you have all the information collected, and you're viewing the results and the graphs that show you the data And trend analysis are the behavioral patterns of your users. What is the useful next step from that? And what are some of the ways that you can provide some useful sort of constructive feedback in terms of this is the structure of the events. And then this is the actual benefit that you can gain by having this knowledge and some concrete actions that you can take to improve the growth and viability of your company or your project or your product
    25:39 Yeah, primarily for the critical path of getting data into and out of the system.
     
    Extending The Life Of Python 2 Projects With Tauthon
    2020-06-02 (duration 33m)
    [transcript]
    20:02 and have most of the features they brought back been more core to the actual language itself and the interpreter or have you also been doing a fair bit of copying from the standard library to bring in some of the new capabilities like the ipv6 support and things like that
    20:18 the sort of approach that I took was initially starting with the core language itself. And then after that, doing the libraries, and the main reason for that, frankly, is that the libraries are written in Python three. And if you start by backporting, the libraries you have to take a lot of the code which is oftentimes using new functionality from Python three, and you have to remove that functionality to make it work in telethon. So I didn't want to do that. And instead, we first worked on the on the core language. And then after that started backporting, some of the some of the different standard library features
    18:48 And as far as the maintenance of the features that you're backporting and the existing features, have there been any cases where you have had bug fixes that have have been difficult to bring back or new reported errors because of conflicts with the Python three functionality and how it manifests in the Python two code base,
     
     
    Dependency Management Improvements In Pip's Resolver
    2020-05-26 (duration 1h16m)
    [transcript]
    23:54 And in terms of the new dependency resolver. How are you approaching that implementation? And what are some of the constraints that exist within the Python ecosystem that are influencing the overall approach?
    12:18 So the focus of the work that we're doing is fairly well scope. We're replacing its current dependency resolution algorithm with something else that's not broken. That's sort of the core bit that we're doing. The other part of it is we have user experience experts who have been brought on as part of the funding we've raised, and they are working to collect user data and collect user information, analyze it, and sort of work with us as Phipps maintainers and the broader community to improve the COI all the error messages all the reporting in to be more useful for the users
    56:25 Probably in the end, though, I promised
     
    Easy Data Validation For Your Python Projects With Pydantic
    2020-05-18 (duration 47m)
    [transcript]
    26:17 And what are some of the edge cases that developers often run into? Aside from the confusion about the strictness versus coercion of the typing information? One of
    28:00 and for the capability of being able to access the different attributes and pass along the data. What are some of the sort of best practices that you have found to be useful of whether you pass the exact model or just a representation of the data set and then do the coercion back and forth throughout?
    05:45 Yeah, the feature creep.
     
    Managing Distributed Teams In The Age Of Remote Work
    2020-05-11 (duration 48m)
    [transcript]
    26:16 And then another element of communication within the development team is the way that the code is structured the way that you approach the actual writing of the code and the descriptiveness of the code there. One of the things that I've seen before that I think is a good idea, but I admittedly haven't fully put into practice myself is something called comments showing intent, where you write the code comments not to discuss the what or the how, but the overall intent of the code with the idea being that if you stripped out all of the actual code and just had the comments, then you could re implement it in any language and still have it produce the same intended outcome. But I'm wondering what have you found to be the impact of remote teams on the way that code is written and organized? And some of the best practices for reducing the review burden in terms of how to approach that code definition and how to construct the prs.
    15:52 And in terms of your role as somebody who is running the business and managing the team. What are some of the Useful practices that you have found for being able to maintain visibility into the progress that's being made and ensuring that all of the work is staying on the right track, or that people aren't drifting off on to sort of sub optimal paths.
    48:03 Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast comm for the latest on modern data management and visit the site at Python podcast comm to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at podcasting a.com with your story. To help other people find the show, please leave a review on iTunes and tell your friends and co workers
     
    Maintainable Infrastructure As Code In Pure Python With Pulumi
    2020-05-04
    [transcript]
    55:26 what do you have planned for the future of the plumie, tool chain and the business itself?
    21:33 underneath the hood.
    42:45 And in terms of the actual workflow of working with plumie to build and maintain infrastructure, what's the overall process and lifecycle of the code and how it fits into the development cycle?
     
    Build The Next Generation Of Python Web Applications With FastAPI
    2020-04-20 (duration 58m)
    [transcript]
    08:46 another interesting element of the current trend of web frameworks in web development is this focus on API first frameworks and issuing some of the end to end capabilities that are Built into some of the earlier generation, such as Django, where it incorporates the view layer and the HTML generation. And I'm wondering what you find as being the benefits of building the API first or API only framework and some of the challenges that are imposed by having that incorporation of the view layer in the framework itself.
    10:57 not to go too far down the rabbit hole, but one of the challenges That I've seen with building these two separate layers of the API server and then the JavaScript application that interacts with it is the fact that on the front end, you end up having to do a lot of the same data validation that you would be doing in the back end if it was a fully integrated system. And I'm wondering what your thoughts are on that briefly.
    48:18 And in your experience of building and maintaining and growing the fast API project and some of the surrounding ecosystem, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
     
    Distributed Computing In Python Made Easy With Ray
    2020-04-14 (duration 40m)
    [transcript]
    35:25 And also given the fact that the core elements of Ray are written in c++ in terms of the capabilities of the task distribution and the actual execution. I imagine that there's the possibility for expanding beyond Python as well.
    36:27 Are there any other aspects of the ray project itself, or the associated libraries or the overall space of distributed computation that we didn't discuss yet that you'd like to cover before we close out the show, or the work that you're doing with any scale?
    27:51 And another aspect of this overall effort is as you mentioned, the any scale company that you and some of your collaborators have full To around Ray to help support its future development. I'm wondering if you can discuss a bit about the business model of the company and some of the ways that you're approaching eight and some of the ways that you're approaching the governance of the ray project and its associated ecosystem.
     
    Building The Seq Language For Bioinformatics
    2020-04-07 (duration 36m)
    [transcript]
    18:18 the actual implementation of ck itself, I know that the syntax is, as you said, largely similar to Python. And I'm wondering how you approach the actual construction of the language and the compiler. And if you are able to leverage any of the elements of the C Python implementation, or at least use it as a reference for things like the tokenizer or anything like that for being able to build the parser and the compilation and some of the underlying architectural decisions that you've made as you have gone through implementing seek?
    22:51 as far as the actual bioinformatics field, I know that when looking through the documentation for this project, it alludes to the fact that there is a lot of sort of messy code or inconsistent approaches to problem solving in terms of the way that the software is developed and challenges in terms of the speed of execution. And I'm wondering what impact you anticipate seek having on the overall domain of bioinformatics and genomics and some of the standards that could be implemented in terms of the training of people who are working in those fields to improve the overall capacity for being able to run these analysis and the impact that this increased speed has on their ability to perform meaningful research?
    13:15 And then for the target audience have seek is it largely people who are working in the sciences who just need to be able to process the data that they're coming from? Or is it also common that there might be a set of programmers on staff who work with the domain experts to be able to understand the scope of what they're trying to work with? And then the implement the actual applications for them and curious what types of challenges that poses in terms of how to approach some of the interface and workflow for the people who are actually using seek in their day to day work?
     
    An Open Source Toolchain For Natural Language Processing From Explosion AI
    2020-03-31 (duration 51m)
    [transcript]
    29:33 And the third component that we mentioned at the opening, and that ties into this whole ecosystem that you're building out is the think project, which you mentioned was extracted from the spacey project originally, I'm wondering if you can just talk a bit more about the motivation for releasing it as its own library and some of the primary problems that you're aiming to solve with it within the ecosystem of machine learning and data science.
    04:00 At the time, when we spoke, the natural language toolkit was still sort of the de facto standard for anybody who wanted to do any sort of natural language processing. But these days, most of the times when I see references to people doing any sort of NLP spacey is become the more prominent library for that. So I'm curious what your sense of that has been, as the creator and maintainer of spacey, how things have progressed over the past few years in terms of the level of popularity and adoption for your library.
    46:24 And as you look to the future of the explosion company, and the projects that you're building there, what do you have planned? And what are you most excited for?
     
    A Flexible Open Source ERP Framework To Run Your Business
    2020-03-23 (duration 1h7m)
    [transcript]
    44:03 the main business
    53:35 project that was a little bit different that the usual it's it was for it's a project we develop it for a company where the main goal was to provide an application on the cell phones where people could buy Stuff like sandwich sandwiches from their local grocery and they could order it when they were at work in the morning, and schedule the time when they want the sandwich and so, the payment was corrected by the application and beyond the rest of Triton server receiving the orders and dispatch the orders to printers in the shop of different grocery everywhere in the city and so on. And so they received the the kit with a number and the orders and the prepare it for the time it was specified. And so the guy who agreed with this application can go to the shop and take directly back here with shop Cathy's shopping them and without having to pay because it was already paid by the electronic ID and so on. And so it was a way to, to skip the lane in such a shop where everybody goes at the same time, get his foot, Jackie. And so the the commission could organize the work prepare in advance the orders and so on. So that was a nice project where Triton was involved to follow the orders you received from the application dispatch to the right shop and make a small accounting about how much was said and done at the end. sent the money order it for four to each shop that Yeah, that was an astonishing way of using Triton And then
    26:23 that will be used to constrict the main class of the activity code. So, depending of which modules are active activated, the server will build dynamically the class by composing all the small paths of all the small classes that are disseminate in different produce to construct one the main class for a kind of document this Where we can have a material that define a main object I can keep with the same example and a another material that extend the sale Medan by ID, for example, a new field or a new step in the workflow or new methods and so on. So that's the server we combined both class to create one main class that will be always be used by the server to manipulate the object. So, we dynamically create those. base class. And this gives the power of in Triton to be very flexible because, by activate a module in Triton, you can modify the behavior of existing objects directly And you can really modify any property of the of the object. It can be a field of the class, the methods on it, and then so on. And of course, thanks to the introspection to create the database, she met scheme, the activating a Medusa that the new field will automatically create the new currents in the in the database.
     
    Getting A Handle On Portable C Extensions With hpy
    2020-03-17 (duration 35m)
    [transcript]
    21:08 So in terms of the timeline for this, what do you anticipate as being the amount of time that's required to get this to a point where library developers should start using it as the target for building their own work? And what do you see as the path forward for the C Python implementation as far as adopting this as at least an optional interface for the core runtime.
    20:09 As you have gotten further along in this project, what are some of the ways that the initial vision and scope of the project has been updated or any assumptions that you had going into it that have been changed or modified in the process?
    22:43 just curious about the timeline for the project. And any plans are projected timeline for H pi to be available in the core c Python runtime as at least an optional interface?
     
    The Advanced Python Task Scheduler
    2020-03-02 (duration 33m)
    [transcript]
    18:42 And so in terms of some of the other uses of the project and some of the community that's built up around it, what are some of the most interesting or innovative or unexpected ways that you've seen it used and how has the overall community reception and growth Then for you as the maintainer of the project,
    23:19 And in terms of your experience of building and growing the project and the community around it, what have you found to be some of the most useful or interesting lessons that you've learned in the process?
    15:30 has the introduction of async. io as a core primitive in the Python runtime brought about more interest in AP scheduler because of the fact that there is a greater possibility of actually having the asynchronous offline tasks co located with the primary runtime and has that brought people into using AP scheduler within those types of projects.
     
    Reducing The Friction Of Embedded Software Development With PlatformIO
    2020-02-25 (duration 46m)
    [transcript]
    41:41 And what do you have planned for the future of the platform IO project. So
    03:15 Yes, it was 2012 a year. I don't remember Harry had home assistant this time. But the whole thing which I did in the 2012, it was actually to create the bridge between the hardware and actually the home assistant. And the project was named like smart. And I spent two years for this project. And this is actually the place where the platform IO was born. Because smartphones the product is this is high level abstraction on top on, on the hardware and how my system can actually communicate with the smartphone who uses this API interface and the platform, your worst use here here just to deploy a special, firmer Regards this operating system to the different devices where and developer who will work with smart smartphones the project will actually communicate with that network device. And it doesn't matter which type of network the end device will use that smartphone, emulate one type of connection and one type of interface.
    38:06 and in terms of sustainability of the platform IO project and the business that you've built around it, what is the overall business model? And how are you approaching things like project governance so that people aren't driven away by the perception of platform IO, kind of governing the entire process and not really taking community feedback and just the overall community, the ecosystem that you've been able to build up around? it?
     
    APIs, Sustainable Open Source and The Async Web With Tom Christie
    2020-02-18 (duration 43m)
    [transcript]
    35:34 And it doesn't make sense to do these things with Django or Django xrm because they're so mature and that they're in their space. But, you know, when you happen to have a great big new green space that needs building up anyway, that's the time to kind of that's the time to do it.
    41:36 All right, sure. So out of nowhere, let's go for the lobster, which is just a very strange film. That's all strange and sometimes funnies quite dark. Yeah, I won't say anything about it. It's just odds and something that I've just about finished reading Recently a book The the master and his emissary, I found really, really interesting. It's about the two different hemispheres of the brain and the author's particular take on the kind of the cultural implications of what the two different hemispheres each focus on what they're kind of the world view that each of the hemispheres. So that's the master in his emissary I in McGilchrist,
    35:57 And are there any other aspects You're work with the ENCODE organization, or your work in the web space and async in general, or any of the other aspects of the Python language or community or your career that you're focusing on that we didn't discuss yet that you'd like to cover before we close out the show?
     
    Learning To Program Python By Building Video Games With Arcade
    2020-02-11
    [transcript]
    41:00 Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast com for the latest on modern data management. And visit the site at Python podcasts calm to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project and the show then tell us about it. Email host at podcast in a.com with your story to help other people find the show, please leave a review on iTunes and tell your friends and coworkers
    36:40 And as you continue to work on arcade and improve it, what do you have planned for the future of the project,
    34:47 There has been and in fact I got an invite from people that do the Python discord because there was enough interest that they want to be on the that discord for the game development channel that they recently created. So I hang out there on A pretty regular basis. And I wish I had the ability to look it up really quick. But I think one of the winners or one of the finalists recently on the PI week challenge did use arcade. But I could be wrong on that. I'm just going off my recollection here. So
     
    Build Your Own Personal Data Repository With Nostalgia
    2020-02-04 (duration 32m)
    [transcript]
    16:49 and what are some of the overall assumptions that you're using to direct the focus of your development and some of the ways that you're thinking about the interaction and user experience design of the overall project.
    25:28 And so what are some of the future directions that you have planned for the project either in terms of new data sources or new applications to consume the data or just overall updates to the system infrastructure and system design?
    32:16 Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast com for the latest on modern data management. And visit the site at Python podcasts calm to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at podcast and a.com with your story. To help other people find the show please leave a review on iTunes and tell your friends and co workers
     
    Simplifying Social Login For Your Web Applications
    2020-01-27 (duration 34m)
    [transcript]
    12:47 for sure, the difference in the protocol were problematic a problem to solve by the framework, but in the end, I found defining a really simple interface or interface of what application needed from this provider that they have very well as they call it boss, because this simple interface allows me to hide the complexity of the provider while still fulfilling the requirements for the library. For instance, there's some metal core, get user details, I don't care. The rest of the code doesn't care about the particular implementation that dismantle a task for the different providers. It only cares about the output restored is a nice day having the user data simpler to build a user in somebody's store in the database. So ensure that defining this interfaces a clearly defined these interfaces for the rest of the code was key. To hide all the complexity behind the protocols or the different location provides them without difference from the protocols like open ID works a bit different, allows. They communicate differently, specs are peddling ball, Chase or API. JSON API soil providers, there are these key difference between them. But it was not as much as it was, it was begging when adding them. Some other of defining the parameters that need to be invoked on the different providers that is already available there to to pay the rest. It wasn't a composition complicated in overall, the authentication flow is quite similar. You click a button on your site, you'll get the redirected to the provider for your credentials. And then you are sent back to your site to continue that education process, which usually is hidden
    10:12 Yeah. So again, the beginning it was chambers. Okay. Now, this unit proposed was to solve so sanitation problem for Shasta shango projects. So in the beginning, it was really highly complex with a framework a even but even some basic blocks of what's today been set up already exist by the so. A key part of the implementation was ensuring the backends barkins are the parts that we handle the communication with education provider. So a key part of the individual was defining this packet with a goal of hiding as much as possible the complexity of the authentication provider wide offering clear and simple interface interface to the users These robotic which is the rest of the call it sexy love then this models usually like our initial corporate sure you have wallets where we store a date user ID authentication provide the reference to the user into database in and then there was the pipelines feature this was really early into listen to Kobe's looking at the features for December lotteries. List of functions that get called on a given or they're the output of the previous version is positive the Death Squad until the last one is security and they call as a pilot is having to have a user a pilot user on your database. It's okay to authenticate those a basic block of the application made it possible to enter Bobby to Python associate out where the shango related bits were moving to a new concept as colon strategy these are strategies are the glue in between the framework particularities vital silicon Core does quite an old bicycle to support shango for aska. And pyramid says, you can have more integrations if you want. And
    12:13 yeah, those are the basic blocks today to square that you have dedication baganz, heightened complexity of the providers models for a storage of data certification filters, five guns to extend your particular functionality, the strategies to fight it framework complexities, and what have you found to be some of the most complex or challenging aspects of designing and building this library, particularly given the number of different identity providers and the variances in the authentication protocols that they support. So
     
    Building A Business On Building Data Driven Businesses
    2020-01-20 (duration 41m)
    [transcript]
    29:09 The other thing that I'm interested on the business side is the overall business model that you have and some of the ways that it has grown or evolved since you first started the company, and just your overall lessons learned in terms of managing the business behind the product.
    37:34 And for the future of read ash, what do you have planned both in terms of the business and the open source project,
    25:46 Yeah, it's definitely interesting the ways that people will work around the sharp edges of a tool and make it do things that it was never actually intended to do just because it's the tool that they have, rather than seeking outside and looking for the tool that's more well suited. To the particular problem that they have, because it's only 5% of their use case and the other 95% is filled by the tool that they have. beyond just the open source code base, read ash, there's the business that you've built around it. And so I'm curious what you have seen as the sort of benefits of having a hosted solution in terms of the adoption of the product and how you balance the needs of the business against the desires and needs of the open source community that are using and contributing to read ash.
     
    Using Deliberate Practice To Level Up Your Python
    2020-01-13 (duration 48m)
    [transcript]
    47:57 Thank you for listening. Don't forget to check out Other show the data engineering podcast at data engineering podcasts calm for the latest on modern data management and visit the site at Python podcasts calm to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcast and a.com with your story. To help other people find the show. Please leave a review on iTunes and tell your friends and coworkers
    40:41 And as you prepare for the year that's starting now and the continuing changes in the Python language and ecosystem and the overall landscape of where it's being applied. What are some of the types of practice and learning that you're We're looking to engage with and what do you have planned in terms of updates or New Directions for your trainings and the types of lessons that you're going to be delivering.
    05:51 So I often make the analogy between Stack Overflow, and a phrase book that you would use when you go visit a foreign country and you don't know the local language. And so the phrase book is good for sort of, you know, where's the train station? Where's the bathroom, I'd like to buy that loaf of bread. But the danger of course, with the phrase book is a, you'll say it, and it won't come out the way that you want. I'm not referring to like, you know, the Python Hungarian phrase, book sort of sketch. But you can also have the the problem of you say something, and it sounds so good and authoritative, that people then respond to you at high speed, and then you're really sunk. And so you know, Stack Overflow is great. I mean, all of us use it. I certainly use it all the time to get answers to things. But yeah, the context it's, it's even more useful. If you think it's useful. When you don't know anything. It's even more useful when you do know things. And you know how to judge the answer, and then use it to really fix the problems that you're trying to solve.
     
    Checking Up On Python's Role in DevOps
    2020-01-06 (duration 33m)
    [transcript]
    14:01 And another aspect of the Python runtime is that it comes as default on a lot of different Unix based systems. But it's not necessarily going to be the most recent version. So that might influence the way that you approach writing the Python that you're using for managing the systems. And then there's also the difference of if you're actually using Python as the runtime for the application that you're trying to deploy on to the systems, it might be trying to target a more recent version than what's available in the operating system repositories. And so I'm curious what you have seen as far as some of the challenges that people face in managing the versions of Python on their servers and the likelihood that the system runtime the way that then the scripts that we're using to automate that system are going to be just using the built in version.
    28:48 And so for people who are coming to the book, who are trying to figure out the best ways to use Python for managing different systems. What's the sort of next step after they've read the book? gone through the material, are there any useful resources or references that you can suggest or sort of what's the level of familiarity and capability that you expect them to be at by the time they finish.
    07:10 And in terms of the particular aspects of using Python for managing systems, what are some of the general categories of the types of operations that Python would be used for in the context of systems administration and systems automation?
     
    Python's Built In IDE Isn't Just Sitting IDLE
    2019-12-24 (duration 36m)
    [transcript]
    05:50 because of the fact that idle has been baked into the language for a while, it's often pointed to as sort of the first environment for people to start expanding. ending with writing Python or learning to program. And so I'm curious what you have seen as being some of the main use cases for idle and how the original motivation of building it and having it as part of the language has evolved along with the evolution of the computing environments that we have and just the overall capabilities of other editors or the ease of use of getting started with Python and text editing environment.
    03:47 so as you mentioned, it's targeted mainly at beginners to the language or beginners to programming in general. And it's been part of the standard library of Python for quite a while now, and I'm curious if you happen to know the original motivation for adding it to the core of Python and some of the history behind it,
    35:51 Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcasts. com for the latest on modern data management. visit the site and Python podcast.com to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at podcast and a.com with your story. To help other people find the show, please leave a review on iTunes and tell your friends and co workers
     
    Riding The Rising Tides Of Python
    2019-12-16 (duration 44m)
    [transcript]
    36:40 Yeah, we'll see. And since you are living in Pittsburgh and engaged with the community there, and that's where pi con is going to be for the upcoming two events. What are some of the things that people who are attending the conference should know about the area?
    17:09 And what are some of the other ways that the growth of the community and the way that it has evolved has impacted your career trajectory from working in trading to where you are now working in data?
    43:34 Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast com for the latest on modern data management. And visit the site at Python podcasts. com to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcast and a.com with your story. To help other people find the show. Please leave a review on iTunes and Tell your friends and coworkers
     
    Debugging Python Projects With PySnooper
    2019-12-09 (duration 45m)
    [transcript]
    06:09 for the remainder of my career.
    19:51 Just curious about some of the ways that the implementation of pi Snooper has evolved since you first began working on it and having the idea about it. And some of the types of programs that you were working on that influenced the direction that you took it. Interesting good
    15:55 So for somebody who is using pi Snooper, what are some of the types of information That it surfaces and how would you incorporate that information into the overall debugging process of identifying? What's the control flow for the logic? What are some of the branch points? How do I figure out what the values are given these different operations and understanding how to actually fix the problem that you're trying to debug
     
    Making Complex Software Fun And Flexible With Plugin Oriented Programming
    2019-12-03 (duration 1h2m)
    [transcript]
    1:00:18 of the pop implementation itself or the overall paradigm or some of the ways that you using it that we didn't discuss yet that you'd like to cover before we close out the show?
    23:41 I always worry about the communication
    23:42 aspect. And so in terms of the pop library or framework itself, however, you want to refer to it as I'm wondering if you can dig into some of the ways that it's implemented and some of the other languages or libraries or ecosystems that you've looked at for inspiration as you have it. graded on the design and philosophy around the development pattern and the specifics of pop itself.
     
    Faster And Safer Software Development With Feature Flags
    2019-11-26 (duration 1h1m)
    [transcript]
    52:00 The not invented here problem.
    57:40 plus one on on Adafruit plus one on all of the socket Python stuff is super cool. Also, I'm just a huge fan of Adafruit in general and they've got a load of just really awesome documentation on there. Like the kind of learning section of their website is amazing and it's amazing how much stuff they do for free. So yay, I love Adafruit man My recommendation is a book or my pick is a book called accelerate. And this is this is written by Nicole falls green, Jess humble. And Jean Kim just humbled is also one of the co authors of that continuous delivery book. And today are the people that were behind that the DevOps report that has been coming out every year for the last few years. It's amazing because they using actual science to kind of validate what kind of engineering practices work and what don't in terms of like org sort of making more money at the end of the day, they actually kind of like look at like, which organizations are performing better from kind of like a either a kind of a capitalistic perspective, like they make more money than their peers or from a kind of a social perspective, what are they doing better? And then they back that all the way to what are the engineering practices that that
    1:00:46 Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast.com for the latest on modern data management, and visit the site at Python podcasts. com to subscribe to the show up for the mailing list and read the show notes. And if you've learned something or try it out a project from the show then tell us about it. Email host at podcast and a.com with your story. To help other people find the show, please leave a review on iTunes and tell your friends and co workers
     
    From Simple Script To Beautiful Web Application With Streamlit
    2019-11-18 (duration 49m)
    [transcript]
    30:09 That would break the illusion.
    20:00 The choice of Python seems fairly natural, given the initial community that you were targeting because of the fact that it's so widely used in the data space, but for data engineers and for data scientists, but from the perspective of the project itself, and the way that you have engineered it, if you were to start it over today, knowing what you do now, what is it that you think you would do differently, either in terms of the overall system design or in the early efforts of building and promoting it?
    48:19 Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast com for the latest on modern data management. And visit the site of Python podcasts. com to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out of projects and the show then tell us about it. Email host at podcast in a.com with your story to help other people find the show please leave a review on iTunes and tell your friends and coworkers
     
    Automate Your Server Security With GrapheneX
    2019-11-11 (duration 35m)
    [transcript]
    15:54 a good question because when we try to find eligible modules for framework, it's the something that we always considered. So we have modules, then modules have the description of the OS command for the hardening process. Actually, we try to give a try to give every detail about that command to user with the info Information section of the module. But we don't have a control mechanism like that for preventing, running some running the commands that you said, but we always try to inform user about that command. And most of the commands require the root access. So we won't we won't users about them, but we don't want about the contents of that module. Instead, we try to inform about the content of the module and user should be, should be careful about what he or she is doing about the module. So we don't have any warning mechanism or checking with you don't do not check the service or the file content before the before executing the hardening command or the module. But we we try to inform the user basically,
    17:39 of course, so we we have a team. So we try to keep things simple. And we try to choose a program language that everybody knows and capable of doing some things with it, and we choose Python to move on. We started with writing the interactive shell Use the pythons CMT module for it, it's it was pretty much the it has the features that we want it we create the interactive shell with it then we moved on to the web interface which is currently flask and socket IO based but we will probably change it later to something else actually we have a pull request that will change the entire back end side of the web interface but we use the flask for now and the hardening process was handled with the standard built in libraries. We use them for executing and system command. That was we didn't use anything external. For this purpose and other things like the locks and the colored locks we use the color among colored looks module and printing is handled with terminal tape. library and some comments request user input. And we used PY include library for the prompt. And this is pretty much it about the technical side of the project. It's tried to keep it simple. And the on the website, I had to mention about the JavaScript and HTML and CSS, the classic way of creating simple websites. Actually, I'm not a web guy. So follow candles, all the web side of the project. And this is this is pretty much it about the tech side of the project.
    35:00 You're listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast com for the latest on modern data management. And visit the site at Python podcasts. com to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcast
     
    Accelerating The Adoption Of Python At Wayfair
    2019-11-03 (duration 42m)
    [transcript]
    19:05 Yeah, so it's it's got the band aid problem where the brand name has become synonymous with the actual
    10:37 And for the groups that you work with closely. Can you talk about some of the just overall day to day of how your team works with the other engineers and some of the sort of the, like the typical engagement and maybe some of the common points of friction or confusion that exists with people who are familiar with other languages tried to adopt Python?
    27:05 And then in terms of the projects that you have worked on and built up, what are some of the notable successes? And what are some of the cases that you've run into where you ended up having to abort the engagement because of either issues at the technical level or organizational pushback or just a poor team fit?
     
    Building Quantum Computing Algorithms In Python
    2019-10-29 (duration 36m)
    [transcript]
    03:48 Yeah, that's right. I mean, the way that a simple simplified way to think about how you solve problems on our system is that we start off the computation with all of the bits in superposition all the cubits in superposition. And then we apply the problem on top of that. And as the computation goes on the sort of super position force backs off, and the problem force gets applied more strongly. And then they collapse into a into hopefully, what is the solution to your problem.
    17:10 And I'm wondering what the selection process looked like when initially developing the ocean software stack and targeting Python. And if there were any particular requirements that Python met particularly well, and maybe what some of the other considerations happened to be at the time or any other existing implementations for the SDK?
    35:32 Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast com for the latest on modern data management. And visit the site at Python podcasts. com to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at podcast and a.com with your story. To help other people find the show. Please leave a review on iTunes and tell your friends and coworkers
     
    Illustrating The Landscape And Applications Of Deep Learning
    2019-10-22 (duration 56m)
    [transcript]
    53:04 as if the singularity happens.
    22:19 And for the target audience of the book, I'm curious how much background understanding of programming or statistics or machine learning is necessary and to what level of facility you expect them to get to by the end of the book.
    16:59 And I'm curious if you can give a bit of a comparison to some of the other books that you've encountered on the subject of machine learning or deep learning and some of the benefits that you see your book providing comparatively. And some of the some of the ways that the target audience that you're focusing on would gain better understanding or better value than some of the other books that they might be able to pick up. Perfect. So
     
    Andrew's Adventures In Coderland
    2019-10-15 (duration 1h0m)
    [transcript]
    52:23 Because the, you know, the word the the world outside of a laptop
    51:34 Yeah, that's another really good question. It's it. I mean, the advice of the engineers
    33:55 So at the one level of your work on this book, there's the effort to understand software in and of itself and how the computer operates and some of the complexities that are necessary to be overcome to be able to do useful things with the software. But at the other level, there is the cultural aspect of how people who work in this industry think about things, how they interact. And as somebody who's been in the industry for a while, it's hard for me to get some of the outside mentality. And so I'm curious what you have found to be some of the most striking contrast that you've identified between software engineers and the general coding culture, and how that compares to the perspective of a lay person who is on the outside of that sort of subculture and observing it.
     
    Exploratory Data Analysis Made Easy At The Command Line
    2019-09-23 (duration 52m)
    [transcript]
    17:34 So probably the
    28:35 I'm also wondering which terminal environments that supports because Windows is generally one that's left out of the support matrix for command line tool, but because of the fact that Python does have the curses interface built in or if you are relying on the prompts tool kit library, I know that there is the possibility of being able to support the windows command lines, I'm just curious what the overall support matrix is for visit data.
    22:25 well, one tip there is that if you add the grip dash be one that it'll show you the line that you want, as well as the line just before it.
     
    Cultivating The Python Community In Argentina
    2019-09-18 (duration 41m)
    [transcript]
    11:22 and what of what are some of the ways that you facilitate the growth and interaction of the community. And some of the types of resources and events that you help to provide,
    06:36 And Argentina itself is a fairly large country and the group that you have put together, IT services the entirety of the nation. And I'm wondering how the overall geography and culture of your country influences the focus of the community and any of the challenges that you face in terms of trying to facilitate interaction for such a wide world distributed group of people.
    34:35 I think that one of the aspect I in general says I am very happy with the language. It For example, other people say it's is slow in some situations, but I not really, it's that's not really a problem. For me, what I will really want to see improved in the midterm is this time, time for for the for the for the Python process, the time that is there between you type vice and three in the in the terminal, and the script really starts executing this. That time, I think that really helped a lot of different areas where Python could be more widely spread. And it's the problem is that you cannot release executed feeling by fans in a millisecond. I'm exaggerating. But that's the the
     
    Python Powered Journalistic Freedom With SecureDrop
    2019-09-10 (duration 38m)
    [transcript]
    15:01 it's better for the journalist
    23:25 Yeah. So for now, mainly just focusing on the actual development and maintenance of the project. And then we'll talk about some of the interesting use cases after
    33:52 of the future of the project, what are some of the new features or improvements or just overall work and effort that you you have in store in the near to medium term and any help that you are looking for from the community to improve it or add new capabilities?
     
    Combining Python And SQL To Build A PyData Warehouse
    2019-09-02 (duration 43m)
    [transcript]
    34:58 And in terms of we're all trends in the industry and in the Python community, what are the some of the things that you see going forward that you're excited about as far as the tighter integration of the pie data stack and data warehouses and any challenges that you anticipate us running into as we try to proceed along that path?
    29:14 think that's the place where actually
    33:31 Well, I think the biggest thing
     
    AI Driven Automated Code Review With DeepCode
    2019-08-26 (duration 33m)
    [transcript]
    26:08 And looking forward, what are some of the features or improvements that you have planned for the platform and for the business.
    22:12 In terms of the platform itself and its capabilities, what are the some of the overall limitations and some of the cases where you might not want to either use it or avoid some of the recommendations that it might come out with just because of some of the artifacts of the code that you're trying to feed through it.
    24:47 But you're right, the cloning is the slow part. So those large tissues, large repositories, usually cloning takes a while, and then an ISIS takes much, much faster. In our case. So that's really now we actually separating the shoulders we're calling people know why the slow. But yeah, so cloning mistakes, sometimes fast, the slow, especially if you the dominant network, in the cloud, and it's a lot of people, but then the analysis is much, much faster than the cloning.
     
    Security, UX, and Sustainability For The Python Package Index
    2019-08-19 (duration 51m)
    [transcript]
    16:16 And, Nicole for you, as well. I'm wondering what the surface areas that you're dealing with as far as the user experience work, and some of the ways that that manifests at the different trade offs and the interactions between API's and web UI, and the overall package upload experience, etc.
    06:06 And particularly for a unicorn, what was the state of the system at the time that you first began working on it, and any of the notable issues that you were first faced with?
    14:46 and wondering too, if you can just enumerate the overall list of interfaces. And the total surface area of the problems that you're each working with, as far as the special is a mix of the API project, because with some projects that might be limited to just the web UI, others it might be just an API. But with API, there's the web interface there the API's that users are using those the actual data integrity, as well as the actual interactions that people have of downloading and installing the packages, which is potentially another attack vector that isn't necessarily going to be present in other projects.
     
    Learning To Program In Python With CodeGrades
    2019-08-12 (duration 1h4m)
    [transcript]
    52:37 Nicholas Tollervey: The end goal is very simple. And that is for co grades to become sustainable, and useful, and a positive contributor to the wider software development community in the same way that music grades are for the music world. It's a complicated project. There are lots of ways in which you can go wrong, there are lots of things that need to happen to make it go right. So the biggest challenge I have is just trying to make that happen. really soon, in the autumn fall in America, I'll be publishing the syllabus publicly, at the moment is being reviewed by the folks helping me with the mentorship side of things. And I'm going to be publishing the website with more details of the process that I'm currently testing with the candidates. So over the winter month, I hope to be able to get lots of open feedback, as it were, I hope that sometime in the new year to start actually properly recruiting senior level engineers who are interested in getting paid to give back to the community by evaluating candidates. And well, we'll see what happens at that point, really, but it's high tell, it's what can I say?
    38:43 Tobias Macey: And we've been talking this whole time with the understanding that the syllabus and the course material that you're working through is oriented around the Python language and its community. And that makes sense, given your affiliation to the language and your engagement with the we're all ecosystem of Python development and people who are working there. I'm wondering what your thoughts are on the viability and the level of effort necessary to be able to translate this to other languages or other sort of sub disciplines within engineering, whether that's systems administration and machine learning, graphic design, and just the overall idea of this being an applicable framework for education both at the entry level, but also, once you've achieved grade eight at the base framework, then going forward to try and achieve grades in some of those other sub disciplines?
    1:00:33 Nicholas Tollervey: the pipe car project that I mentioned in it, I built that on top of the kimchi library, which isn't perhaps that well known, but it's it's a cross platform Python user interface come game development platform. And the reason that I started looking into this is that not only does it work on Windows, iOS, iOS, 10, and Linux, it also runs on Android and iOS. So it targets mobile devices. And this is the next thing I'm going to do is have a look at trying to get the pipeline projects working on my mobile phone and tablets and things like that. And when it comes for written suggestions, and I'm going to suggest to if you don't mind on first one is a biography of the philosopher Ludvik Einstein by by another philosopher called rain monk VidCon. Stein the duty of genius, it's called. Now, I imagine your listeners sort of holding their their heads in their hands going home, God, this sounds terrible. But really, it's a very readable biography, and VidCon. Stein was a fascinating man. And I especially like the chapter that describes the discussions Victor Stein was having with Alan Turing about the fundamental nature of mathematics, that the fundamentals of the nature of mathematics and logic and things that was that was really, really very interesting, and the sort of thing that would appeal to geeky programmers who like their kind of math and logic. The second one is an old chestnut, The Hitchhiker's Guide to the Galaxy, my 11 year old son is reading it at the moment. And he's laughing out loud at different sections, and he's sending to the family whatsapp group, particular quotes and things like that. But the reason I mentioning it is that I love the sense of optimism about how useful some sorts of technology can be in a playful sort of way. But I also liked the way Douglas Adams is completely cynical about the way technology can be used in the wrong sort of way. And I'm thinking about the way the doors open and close on the spaceship with an artificial intelligence that everybody loads on. This reminds me of certain ways in which the web has developed.
     
    Algorithmic Trading In Python Using Open Tools And Open Data
    2019-06-17 (duration 50m)
    [transcript]
    23:57 Alex Catarino: Well, for the for the libraries, we we, we have our own every time that we add new libraries, or update the hour of the framework, the debate, the basic Docker container that we have, we test it against a bunch of algorithms, as I kind of the unit test to see if it's not broken. But we try to keep the the versus cost, cost of thing time and just make an upgrade when we when you see it. So we are always looking for the the state of the art of each library.
    49:28 Alex Catarino: Yes, for this week and watch the series day hundred. Because it talks about the future of humanity and the dangers of
    32:46 Alex Catarino: Well, we're constantly amazed by the creative of our users, I work with support, and I see what they are doing the end, it's incredible in the bushes to the rock the new technology every day, like Germany mission with the parameters come. By the way, that part, we have done that with the titles of it script that, that takes the parameters, most of what we do its intellectual property of the users. So we can discuss the things like in the examples that I gave before, but they are there's also work. For example, marine needs to be around open source and drive application to the API, so people can control their live Arboretum on the goal, or James has released an open source optimizer, you can rush a generic immunizations with lean, that we still don't support but eventually in the future, and to give you more back,
     
    Web Application Development Entirely In Python With Anvil
    2019-06-10 (duration 57m)
    [transcript]
    50:42 Tobias Macey: And you mentioned a couple of times some of the ideas that you have for the roadmap. And I'm curious if you can talk through what you see as being the future of Anvil, both from the technical and the business side.
    52:43 Tobias Macey: And one thing that I didn't ask yet, but that I'm curious about is how you ended up selecting the name for the business and the platform.
    23:50 Tobias Macey: And so given the wide variety of backgrounds of the users that you're supporting, I'm curious how you approach overall view, user experience and usability design and the platform to make sure that they don't run into any roadblocks in the process of trying to get something off the ground?
     
    Building A Business On Serverless Technology
    2019-06-04 (duration 47m)
    [transcript]
    08:06 Tobias Macey: yeah, so that's, I guess, the kind of overview. And so the majority of the technology stack that you have built that data, coral is based on surveillance technologies. And before we get too deep into the technical aspect of how you're leveraging those capabilities, wondering if you can just share your definition of what you see as being the term server lists and the types of services and technologies that fall under that umbrella.
    40:00 Tobias Macey: And you mentioned the challenge of monitoring the sort of capacity and status of all of these different customer deployments. And so I'm wondering what your approach has been in terms of metrics and monitoring, and just overall observe ability and alerting for the product that you're managing for your customers.
    10:38 Raghotham Murthy: Yeah. So when I first got started, I mean, you started looking at AWS lambda. And of course, s3 for storage is kind of the two main services. We also I think, use things like API gateway to be able to provide like an event endpoint and things like that. And the goal, as, as I mentioned earlier, is for vehicle to provide a way for people to specify end to end data flows. So as in be able to collect data from different places, organize the data in different kinds of query engines, and be able to even publish the transform data into applications into production, databases, and so on. So the way we have picked the soulless technologies, has been by kind of carefully looking at every single one of I mean, AWS, for example, keeps providing newer and newer technologies that are kind of these past services. So you've always kind of looked at every single one of them to see what is the best possible way to use each of these technologies to be able to build like a really robust into an kind of data infrastructure stack and the way we have kind of us, so I'll give an example of how it has evolved, right? So earlier, we had kind of your store data in s3, and we said, okay, we would like to make the data available directly from s3, and back then there was EMR, and there was like high on top of the MRI. So we said, okay, we'll actually spin up a hive meta store, and stick all of the metadata into that hyper meta store, and then allow like our customers to use EMR to kind of query that data that sitting in s3, but then about, I guess, a year in Amazon offered the service called the blue data catalog does essentially the hive meta store product as a service by Amazon. So the moment we looked at it, we were like, why are we spinning up like a database and a server to actually run this service, we might as well just use the blue data catalog. So we have, we have like a opinionated view of how these data flows should be built out what the interface should be for our users. And given the technologies that are available, we kind of pick and choose the ones that we feel are kind of the best, best way to kind of provide like the really robust service. And then as a newer ones kind of come along, we are able to kind of rip and replace like small parts of it, while still providing the same service. And, of course, improving the robustness or improving the even the cost of operating data call itself. There's, as you can imagine, there's several other examples, but I think this could
     
    A Data Catalog For Your PyData Projects
    2019-05-27 (duration 50m)
    [transcript]
    13:58 Tobias Macey: And as far as the metadata that you can embed in the catalog definition, I'm assuming that that's also passed on to the end user for being able to inquire about some of the different attributes of the data, such as what you were saying as far as the different columns that are of interest, or maybe having a last generated timestamp so that somebody can inquire about the freshness of the data that they're analyzing, without having to go back to the catalog owner to ask that question.
    33:07 Tobias Macey: And as far as the overall implementation of intake, you mentioned that it's a fairly young project. So I'm curious what some of the assumptions are, that are sort of baked into the origins of intake and how it's designed. And some of the ways that the architecture has evolved, and maybe some of the ways that those assumptions have been challenged or updated in the process. It's a tricky,
    03:06 Tobias Macey: And so as you said, Now you have become the lead on the intake project. So can you start by giving a bit of an overview about what the project is and the story behind why it was created?
     
    The Past, Present, and Future of Deep Learning In PyTorch
    2019-03-10 (duration 42m)
    [transcript]
    36:19 And in terms of the overall technical implementation, what have been some of the most challenging or unexpected aspects that you have dealt with in the process.
    29:43 what are some of the other components of the Python ecosystem, particularly in the data analysis space that are most commonly used in conjunction with pytorch on a given project,
    01:00 Today and launch a new server and under a minute and don't forget to say thanks for their continued support of the show. And don't forget to visit the site at Python podcast.com to subscribe to the show. Sign up for the newsletter and read the show notes and to help other people find the show. Please leave a review on iTunes and tell your friends and co workers and to keep the conversation going. Go to Python podcast.com slash chat to learn and stay up to date with what's happening and artificial intelligence check out this podcast from our friends over at the change log.
     
    Brian Granger and Fernando Perez of the IPython Project
    2015-06-13 (duration 1h21m)
    [transcript]
    1:05:52 Some of the folks Google
    56:20 on the mailing list this week.
    10:41 No, they were extremely, they were extremely generous. And, and, and we owe them a huge debt of gratitude. Because they, they provided support for the project at the very beginning and throughout the project. And they they supported us at critical junctures, they actually funded Brian and I in 2000, and 9010. at a critical point, when we were developing the network protocols, that actually allowed us to build the machinery that eventually led to the QT console and the modern network architecture that supports the notebook, and then the multi language, all of the modern features that really made possible kind of the renaissance of the project. And it's in its modern incarnation. And all of that was thanks to to n thoughts, support and financial contributions. And they've been a fantastic supporter of the community. They've supported the side by conference for many years. And so now we owe them a huge debt of gratitude.
     
    Reuven Lerner
    2015-04-23
    [transcript]
    16:49 that's the that's the both the blessing and the curse of having a benevolent dictator for life. Right, as unfortunately, everybody has an opinion. But is it the only one that really matters?
    44:01 and, and you know, you could learn for both of them. And you could enjoy both of them. And each of them has their works. But but the like, knowing both is really where the the the interesting things happen.
    28:14 No, I totally agree. I think that's actually quite brilliant. Because years and years ago, I worked for man, I worked for a couple of years in the financial service industry. And my boss at the time was one of these brilliant sequel hackers, just like, you know, he said, sequel is like, go and learn the rules and five minutes, it can take years and years to really appreciate all the all the all the strategies involved. He, he actually taught me a way of structuring sequel, you know, query statements in exactly the same format that you're kind of talking about, like, this really rigid format with the the Select part of the clause on top, and then, you know, farmers in the next line, and where is in line after that is this really sort of like, almost visual way representing the syntax, right. So as opposed to this big, long lobby thing, like it kind of the the format, the US shows you what the code is trying to do, by virtue of the way it's laid out? And it sounds like this way that you're talking about laying out list comprehension? does the same thing?
  •  
     
  •  
    Věra Jourová on Surveillance and Covid-19
    2020-03-29 (duration 28m)
    [transcript]
    00:05 the
    00:09 The
    00:11 the
     
    Not That Ambassador
    2019-10-16 (duration 33m)
    [transcript]
    00:02 The
    00:08 the
    00:10 the
     
    Abortion Wars
    2019-07-21 (duration 33m)
    [transcript]
    00:02 The
    00:10 the
    00:15 the
     
    Being Muslim
    2019-06-30 (duration 39m)
    [transcript]
    00:03 The
    00:10 the
    00:15 the
  •  
     
  •  
    Help! My senior partner is a jerk, with Dr Jamie Wyllie
    2019-12-10 (duration 37m)
    [transcript]
    00:10 the
    00:24 the
    00:25 the
     
    When doctors lose someone they love, with Dr Jo Scrivens
    2019-11-26 (duration 40m)
    [transcript]
    00:09 the
    00:23 the
    00:24 the
     
    The magical art of reading sweary books with Dr Liz O’Riordan
    2019-11-19 (duration 41m)
    [from description] In this episode, Rachel is joined once again by Dr Liz O’Riordan, the ‘Breast Surgeon with Breast Cancer’, TEDx speaker, author, blogger, triathlete and all round super...
    [from content:encoded] In this episode, Rachel is joined once again by Dr Liz O’Riordan, the ‘Breast Surgeon with Breast Cancer’, TEDx speaker, author, blogger, triathlete and all round super...
    [transcript]
    01:58 Knight
    07:23 Knight,
    28:15 Knight.
  •  
     
  •  
    EP8: The things that make me different make me, me
    2019-11-11 (duration 22m)
    [transcript]
    12:11 green
    00:04 the
    00:30 the
     
    EP7: You Can't Pour From An Empty Cup
    2019-11-08 (duration 32m)
    [transcript]
    00:04 the
    00:30 the
    01:27 the
     
    EP6: Fee -Fie-Phobia
    2019-11-02 (duration 20m)
    [transcript]
    07:05 green
    07:06 green
    00:04 the
     
    EP5: Hot Mamas And Hot Flashes
    2019-11-01 (duration 39m)
    [transcript]
    00:04 the
    00:30 the
    01:28 the
     
    EP4: Ladies, we got this.Technology For Midlife Madness
    2019-10-25 (duration 37m)
    [transcript]
    00:04 the
    00:30 the
    01:00 the
     
     
    EP3: 'Woulda, Coulda, Shoulda'
    2019-10-18 (duration 19m)
    [transcript]
    00:04 the
    00:30 the
    00:57 the
  •  
     
  •  
    Technocracy and the Fate of the Expert
    2019-09-11 (duration 24m)
    [transcript]
    00:03 the
    00:13 the
    00:13 the
     
    Permanence and Whole Person Impairment
    2019-08-07 (duration 28m)
    [transcript]
    04:24 green
    00:06 the
    00:22 the
  •  
     
  •  
     
  •  
     
    Episode 40 - Cognitive Bias
    2018-05-01 (duration 34m)
    [transcript]
    00:12 the
    00:22 the
    00:29 the
  •  
     
  •  
    Here in Holland Why the Dutch are Different
    2018-04-06 (duration 34m)
    [transcript]
    07:23 green
    24:43 green
    00:06 the
Didn't find, what you are looking for? A podcast, an episode is missing? You can provide a feed-URL, that will be added to the database. You can do that here. In case you think, there's a bug, please contact us. Thanks!