Software Engineering Daily

Technical interviews about software topics.


Where Machines Go to Learn with Auren Hoffman

If you wanted to build a machine learning model to understand human health, where would you get the data? A hospital database would be useful, but privacy laws make it difficult to disclose that patient data to the public. In order to publicize the data safely, you would have to anonymize it, so that a patient’s identity could not be derived from data about that patient–and true anonymization is notoriously difficult.

In every industry where privacy is a concern there is a similar challenge. If there is no place with public data sets, there is no place where the machines can go to learn. The possible machine learning applications that we can build are limited by the data sets that are available.

Auren Hoffman started his company SafeGraph to unlock data sets so that machine learning algorithms can learn from that data. In this episode, we talk about the machine learning landscape in both the short- and long-term time horizons. We also discussed some of Auren’s strategies for building companies, which have been crucial for me in thinking about how to build Software Engineering Daily.

Show Notes

Where Should Machines Go to Learn?

Quoracast Episode

Transcript Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view or download the transcript for this show. Sponsors

Exaptive simplifies your data application development. Exaptive is a data application studio that is optimized for rapid development of rich applications. Go to to get a free trial and start building applications today.


Apica System helps companies with their end-user experience, focusing on availability and performance. Test, monitor, and optimize your applications with Apica System. Apica is hosting an upcoming webinar about API basics for big data analytics. You can also find past webinars, such as how to optimize websites for fast load time.

GoCD is an on-premise, open source, continuous delivery tool. Get better visibility into and control of your teams’ deployments with GoCD. Say goodbye to deployment panic and hello to consistent, predictable deliveries. Visit for a free download. 


fyyd: Podcast Search Engine

 2017-02-17  56m