S2E12: 'Building Powerful ML Models with Privacy & Ethics' with Katharine Jarmul (ThoughtWorks) Artwork

The Shifting Privacy Left Podcast

Shifting Privacy Left features lively discussions on the need for organizations to embed privacy by design into the UX/UI, architecture, engineering / DevOps and the overall product development processes BEFORE code or products are ever shipped. Each Tuesday, we publish a new episode that features interviews with privacy engineers, technologists, researchers, ethicists, innovators, market makers, and industry thought leaders. We dive deeply into this subject and unpack the exciting elements of emerging technologies and tech stacks that are driving privacy innovation; strategies and tactics that win trust; privacy pitfalls to avoid; privacy tech issues ripped from the headlines; and other juicy topics of interest.

All Episodes

The Shifting Privacy Left Podcast

S2E12: 'Building Powerful ML Models with Privacy & Ethics' with Katharine Jarmul (ThoughtWorks)

March 28, 2023 • Debra J Farber / Katharine Jarmul • Season 2 • Episode 12

This week, I'm joined by Katharine Jarmul, Principal Data Scientist at Thoughtworks & author of the the forthcoming book, "Practical Data Privacy: Enhancing Privacy and Security in Data." Katharine began asking questions similar to those of today's ethical machine learning community as a university student working on her undergrad thesis during the war in Iraq. She focused that research on natural language processing and investigated the statistical differences between embedded & non-embedded reporters. In our conversation, we discuss ethical & secure machine learning approaches, threat modeling against adversarial attacks, the importance of distributed data setups, and what Katharine wants data scientists to know about privacy and ethical ML.

Katharine believes that we should never fall victim to a 'techno-solutionist' mindset where we believe that we can solve a deep societal problem simply with tech alone. However, by solving issues around privacy & consent with data collection, we can more easily address the challenges with ethical ML. In fact, ML research is finally beginning to broaden and include the intersections of law, privacy, and ethics. Katharine anticipates that data scientists will embrace PETs that facilitate data sharing in a privacy-preserving way; and, she evangelizes the un-normalization of sending ML data from one company to another.

Topics Covered:

Katharine's motivation for writing a book on privacy for a data scientist audience and what she hopes readers will learn from it
What areas must be addressed for ML to be considered ethical
Overlapping AI/ML & Privacy goals
Challenges with sharing data for analytics
The need for data scientists to embrace PETs
How PETs will likely mature across orgs over the next 2 years
Katharine's & Debra's favorite PETs
The importance of threat modeling ML models: discussing 'adversarial attacks' like 'model inversion' & 'membership inference' attacks
Why companies that train LLMs must be accountable for the safety of their models
New ethical approaches to data sharing
Why scraping data off the Internet to train models is the harder, lazier, unethical way to train ML models

Resources Mentioned:

Pre-order the forthcoming book: "Practical Data Privacy"
Subscribe to Katharine’s newsletter: Probably Private

Guest Info:

Follow Katharine on LinkedIn
Follow Katharine on Twitter

Send us a text

Privado.ai
Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.

Shifting Privacy Left Media
Where privacy engineers gather, share, & learn

Buzzsprout - Launch your podcast

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.