S2E25: "Anonymization & Deletion at Scale" with Engin Bozdag (Uber) & Stefano Bennati (HERE) Artwork

The Shifting Privacy Left Podcast

Shifting Privacy Left features lively discussions on the need for organizations to embed privacy by design into the UX/UI, architecture, engineering / DevOps and the overall product development processes BEFORE code or products are ever shipped. Each Tuesday, we publish a new episode that features interviews with privacy engineers, technologists, researchers, ethicists, innovators, market makers, and industry thought leaders. We dive deeply into this subject and unpack the exciting elements of emerging technologies and tech stacks that are driving privacy innovation; strategies and tactics that win trust; privacy pitfalls to avoid; privacy tech issues ripped from the headlines; and other juicy topics of interest.

Show More

The Shifting Privacy Left Podcast

S2E25: "Anonymization & Deletion at Scale" with Engin Bozdag (Uber) & Stefano Bennati (HERE)

August 29, 2023 • Debra J Farber / Engin Bozdag & Stefano Bennati • Season 2 • Episode 25

This week, we're chatting with Engin Bozdag, Senior Staff Privacy Architect at Uber, and Stefano Bennati, Privacy Engineer at HERE Technologies. Today, we explore their recent IWPE'23 talk, "Can Location Data Truly be Anonymized: a risk-based approach to location data anonymization" and discuss the technical & business challenges to obtain anonymization. We also discuss the role of Privacy Engineers, how to choose a career path, and the importance of embedding privacy into product development & DevPrivOps; collaborating with cross-functional teams; & staying up-to-date with emerging trends.

Topics Covered:

Common roadblocks privacy engineers face with anonymization techniques & how to overcome them
How to get budgets for anonymization tools; challenges with scaling & regulatory requirements & how to overcome them
What it means to be a 'Privacy Engineer' today; good career paths; and necessary skill sets
How third-party data deletion tools can be integrated into a company's distributed architecture
What Privacy Engineers should understand about vendor privacy requirements for LLMs before bringing them into their orgs
The need to monitor code changes in data or source code via code scanning; how HERE Technologies uses Privado to monitor the compliance of its products & data lineage; and how Privado detects new assets added to your inventory & any new API endpoints
Advice on how to deal with conflicts between engineering, legal & operations teams and hon how to get privacy issues fixed within an org
Strategies for addressing privacy issues within orgs, including collaboration, transparency, and continuous refinement

Resources Mentioned:

Guest Info:

Privado.ai
Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.

Shifting Privacy Left Media
Where privacy engineers gather, share, & learn

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.

Copyright © 2022 - 2024 Principled LLC. All rights reserved.

Introducing Engin Bozdag (Uber) &. Stefano Benatti

Engin & Stefano describe their recent talk, "Can location data truly be anonymized: a risk-based approach to location data anonymization" and describe some of the technical & business challenges in obtaining anonymization

Roadblocks when it comes to deploying anonymization techniques & how to overcome them

How to get budgets for anonymization tools; some of the challenges with scaling & regulatory requirements & how you can overcome them

What it means to be a 'Privacy Engineer' today

How third-party data deletion tools can be integrated into a company's distributed architecture

Stefano & Engin describe good career paths / skill sets for becoming a Privacy Engineer

What Privacy Engineers should understand about vendor privacy requirements for LLMs if they're bringing that into their organization

Engin recommends reading the FTC's Demand Letter to OpenAI

The need to monitor code changes in data or source code via code scanning; how HERE Technologies uses Privado to monitor the compliance of its products & data lineage; and how Privado detects new assets added to your inventory & any new API endpoints

Advice on how to deal with conflicts between engineering, legal & operations teams

Advice on how to get privacy issues fixed in an organization

Stefano's advice to Privacy Engineers: message for my fellow Privacy Engineers: "Do not overlook the less technical aspects of the work, as they constitute the foundation of privacy-by-design"

Engin's advice to Privacy Engineers: "You should really have some passion for privacy in order to sustain in this field."

Engin Bozdag: 0:50

This means that it's not just doing a PIA with a software engineer and communicating policy requirements (like delete data according to policy), but actually understand technical limitations, capabilities, and maybe opportunities. When you see common patterns of issues across different products, you should be able to recommend different technical solutions that feel more efficient for different teams. How do you gather those skills? Maybe join engineering discussions as a privacy SME and learn more about these software components, or you build software yourself to understand how these systems work in harmony.

Debra J Farber: 1:37

Welcome everyone to Shifting Privacy Left. I'm your host and resident Privacy guru, Debra J Farber. Today, I'm delighted to welcome my next two guests Engin Bozdag, Senior Staff Privacy Architect at Uber (he's based in Austin, Texas) and Stefano Bennati, Privacy Engineer at HERE Technologies (that's H-E-R-E), which is based in Zurich, Switzerland. Engin is Uber's Principal Privacy Architect and the Team Lead of Uber's Privacy Architecture team. He holds a PhD in AI Ethics and authored one of the first works of algorithmic bias. He also helped create ISO 31700, the world's first standard on privacy- by- design, which I contributed to as well, here from the United States.

Debra J Farber: 2:26

Engin has gained extensive experience in diverse organizational settings, cultivating a privacy-focused career that has evolved over the course of a decade. Throughout his journey, he has assumed multi-faceted roles, encompassing legal expertise, privacy engineering, engineering management, research, and consultancy in the realm of privacy. Stefano is a Privacy Engineer at HERE Technologies. He holds a PhD in privacy algorithms; and, at HERE, Stefano worked on the technology behind Anonymizer (HERE's first privacy-focused product) and authored a number of patents and scientific publications. Today, we're going to be talking about location data and anonymization, what it means to be a Privacy Architect, new privacy architectures, and how to get privacy issues fixed within scaling companies. This is my very first time welcoming two guests on the show at the same time. I'm really excited to dive in, so let's see how this goes. Welcome, Engin and Stefano.

Engin Bozdag: 3:33

Thank you, Debra. Thank you for the invite. Great to be here.

Stefano Bennati: 3:36

Thank you very much. Great to be here.

Debra J Farber: 3:39

Great, okay. So, I originally reached out to you both because I saw that you were presenting the talk, "Can location data truly be anonymized: a risk-based approach to location data anonymization" at the 2023 International Workshop on Privacy Engineering, and I wanted to dig deeper on that topic. I think my audience has a lot of thirst for that. Let's get started. What are some of the technical and business challenges in obtaining anonymization? Engin first, that's to you.

Engin Bozdag: 4:09

Yeah, I think before we even think about different anonymization techniques, we have to understand the business use case. What does the use case need in terms of data and which technique can be used to reduce the risk? So we can't just start with anonymization as an end call. We need to understand the purpose. So, if I can give an example, imagine you have an app where the user might buy T-shirts. They might click in the app. They might type keywords - like the size, design, or specific images belonging to different searches. So, usage data, and anonymizing of such usage data, is very different than anonymization of location data because re-identification risk for individuals will be much higher. I mean, think about medical images; it brings a totally different complexity. Going back to the location data, this is what our talk focused on.

Engin Bozdag: 5:04

We looked into several features of location data that impacts anonymity. So, you can think about aggregation - the data belonging to a single user or is it aggregated among a larger crowd? Or, how long the data is collected as a unit. Do we need entire trips for location or do we need a subset of trips? How frequent is the data? Do we collect it every second, every month? And, how precise and accurate is it? In an ideal world, you would reduce all these features and arrive at anonymous data, but there will be very different use cases. Think about finding the best charging station for an EV. Or, you ask your AI assistant to give you some recommendations on trips based on your habits or fraud detection. So you cannot reduce all those features. So that's kind of the challenge. And again, I notice in some threat modeling frameworks that one's full anonymity, but it might or might not be possible depending on the use case. And, you also need to think about additional controls on top of anonymization.

Debra J Farber: 6:12

Thanks. That is a lot of good framing of the issue. So, Stefano, what are your thoughts on some of the technical and business challenges attaining anonymization?

Stefano Bennati: 6:23

Thanks, Debra. So, let me start with the business challenge and striking a balance between different ways of anonymizing data and the value that anonymized data brings to product development. For me, the best trade-off is when the data retains enough information to produce a high- quality product, while simultaneously it does not retain enough information to compromise the privacy of data subjects. By data subjects I mean those individuals responsible for creating, generating the data. Each product has a specific requirement or specific requirements on data quality and system design, which directly impact what trade-offs are designable. For example, I receive sometimes questions regarding collecting disaggregated personal data to compute statistics, which could as well be computed from aggregated data - the reason being that disaggregated data might become useful in future products. When the business reason is so uncertain and also the requirements are not yet defined, it is hard to justify the privacy cost. So, in these situations, I follow the principle of 'data minimization.' Another interesting example is sharing the location of a vehicle that is involved in a traffic accident. In this instance, the benefits of providing assistance to individuals involved in the accident and ensuring the safety of other drivers can indeed outweigh the privacy concerns. Under these circumstances, a product could be developed based on the 'vital interest' legal basis (under GDPR). I invite you all to check out our presentation, which we gave in our latest talk. You will find an overview of common use cases around location data and the requirements.

Stefano Bennati: 8:00

Now, for the technical challenge, it is about obtaining the best trade-off given the constraints. So, I could ask myself which privacy enhancing techniques are compatible with my use case. Homomorphic encryption could be ideal to process this data, but it won't scale to the quantity and throughput of data. Differential privacy, I cannot use because the data needs to be disaggregated. Then maybe local differential privacy could be an option, and so on. I want to say, choose the technique. What is the best parameter setting? To determine this, I need tools to measure and compare the privacy risk and the data quality in different settings.

Debra J Farber: 8:37

That makes a lot of sense. Maybe we'll delve into tools later. So, for now, I want to know how you deal with the fact that different jurisdictions around the world have different definitions and guidelines for anonymization. Stefano, tell us about HERE Technologies.

Stefano Bennati: 8:55

Yeah, sure. At HERE we keep a global privacy stance and adhere to all local regulations. This means that we always apply the strictest privacy rules from around the world. At times, we fall back to apply local regulations. This only happens, though, with products that are contractually limited to one jurisdiction and we are confident it will not be exported to other jurisdictions in the future. We keep this policy because we want to avoid costly retrofitting of products to be compliant to different sets of requirements.

Debra J Farber: 9:24

That makes sense to do. And, Engin.

Engin Bozdag: 9:27

I want to mention a recent case in Europe that tests these different definitions of anonymization. So, in this particular case, company A was sending data to a Company B and the data Company B was controlled in terms of re-identification. So, the question was: is the data at Company B anonymized because the source data remains at Company A? There are different opinions. Some argue it's not, because there is always a risk to re-identify the persons; whereas others say it depends on the controls and how likely this threat will occur. I think, we don't have a clear threshold yet. So, companies will have their own policies on what an organization is. The assessment is done on a case-by-case basis. You also have specific regulations, such as HIPAA, which has a method for the de-identifying health data by removing identifiers. So, as a result, you know, similar to what Stefano has stated, I observed at organizations they choose a global approach unless there's a specific rule from a jurisdiction. That approach will depend on a company's risk approach, as well. So, they will define their anonymization policy accordingly.

Debra J Farber: 10:50

Yeah, you know so much is risk- based when it comes to setting the policy. So, I imagine every company's got its own set of risk levers and risk tolerance that they need to consider. So, that makes sense. What roadblocks do engineers run into when it comes to certain anonymization techniques like generalization, aggregation, density, noise addition, down sampling? Are those challenges? And then, how did you overcome them in order to achieve an anonymization? Stefano, why don't we start with you?

Stefano Bennati: 11:22

Sure, from what I've seen, engineers typically get stuck when improving existing products - for example, when they apply a new anonymization technique to the product or adapt a product to a new requirement, such as management of data subject requests (DSARs). A common example of those are right to be forgotten type of requests with data deletion, and so on. Products are bound by previous commitments with customers, which might be challenged by the proposed improvements. For example, changing the anonymization algorithm might reduce the quality of data below a certain target that has been agreed with the customer. Another example is data lineage. Implementing this functionality can enable automatic processing of data subject right requests, but it might also require a rearchitecting of the product that can impact on response times or uptime targets.

Stefano Bennati: 12:15

In these situations, we try the technical approach - require working closely together with developers and product people to deeply understand the implementation and the commitments. This is a win-win situation, as privacy engineers learn real-world challenges, which effectively fuel privacy technology innovation. Once the product is well understood, anonymization parameters can be tuned appropriately. This might have the downside of reducing the privacy protection in other use cases, though. Then, the business approach is to talk with our sales team and with the customers to discuss these commitments.

Debra J Farber: 12:51

Oh, that's a really good point. Rarely do we ever think about circling back with the sales team and customers. That's a great point that you should think about more often, because that's kind of the initial point of contact - with customers, on data sharing. What about you, Engin?

Engin Bozdag: 13:05

I think in smaller companies, it's not so much on the techniques, but I still observe that the knowledge around anonymization is a bit immature. So, I see that they just remove some PII and conclude that data is an anonymized or deleted, which might or might not work depending on the jurisdiction you're operating in. I think for larger companies, there's definitely the challenge on getting those techniques implemented. The first one is on lack of proper tooling. So, in large organizations you might have custom database technologies, but no one else is using outside your company and data scattered across multiple teams. So, getting a tool externally and getting it to work with your tech stack is already challenging. And then, asking every engineer to aggregate, mask, or generalize their own data manually is not scalable, so they need platforms to help them with that. So, the lack of proper tooling is one challenge.

Engin Bozdag: 14:07

Second is dependency within data sets. So, a team might be okay with the identifying or anonymizing of their own data table, but there might be downstream services reusing this data. So, this is called 'the problem of many hands.' Engineers might not even know why the data is even there at the first place. What is it doing there and what is it used for? So, before even starting with anonymization, you need proper data classification and up-to-date data inventory so that to determine if it might be anonymized.

Engin Bozdag: 14:40

And the last challenge I observe in larger organizations is data sets with too many quasi- identifiers. If you look, for example, the 2020 U. S. Census, it has nine questions, and there is also a U. S. Census American Community Survey, which has 20 questions. But in large companies, you might have tables with thousands of fields and there might be hundreds of quasi identifiers. So, how would you apply a technique such as L- diversity in this case, considering this table will get updated continuously? So, to summarize, we need tools to help engineers so that they can deploy anonymity, anonymization techniques without so much hassle; and, we also have to understand the limits of anonymization and determine if we need additional controls when it may not be feasible.

Debra J Farber: 15:36

That's really interesting, just like listening to both of you speak about your challenges. It's enlightening for me, so thank you for that. It's pretty exciting. What have you encountered on the business side in terms of business or policy challenges in getting anonymized data? How do you get budgets and what are the challenges with scaling and regulatory requirements? How could you overcome them? Engin, let's start with you first.

Engin Bozdag: 16:04

Yeah, I think. Going back to my example on this U. S. Census Bureau, so the Agency used differential privacy to apply on the census data, which improves the confidentiality of individual responses; but, that led several researchers to protest and they asked the agency to abandon using differential privacy because it delayed the release of the data. So, it took some time before it can be released to the public and to these agencies, and it wasn't accurate enough for their needs. This is definitely an issue in organizations as well because once you say we have to anonymize the data, there will be some delay and an expert will often need to be involved, which means extra costs. In other circumstances, like the government officials, they might require precise information. For example, the government of Egypt used to demand access to sensitive location data and they did not want to anonymize data. These are some of the challenges, not so much on technical limitations, but more on business and regulatory asks.

Debra J Farber: 17:14

It's definitely not.

Stefano Bennati: 17:16

I encountered many of those challenges, and I can tell about this. HERE acquires data from thousands of sources, such as automotive manufacturers, commercial fleets, and so on. The data flows between these entities, so it must comply with the anonymization criteria set by HERE, by the data providers, and by the regulatory bodies. So, you can imagine that the standards typically do not look alike. They can differ a lot between entities. For this reason, I here developed the Anonymizer, a solution to measure the privacy risk of location data and anonymize it to align it with the predetermined level of privacy risk tolerance. This allows data providers to select their preferred anonymization level before sending their data to HERE.

Debra J Farber: 18:01

That's great. Thank you for a peek inside your org. Now, I'm going to ask you a question I ask quite a few people who come on the show. What does it mean to you in today's day and age to be a 'privacy engineer'? Why don't we start with you, Engin?

Engin Bozdag: 18:17

Yeah, I think IAPP recently made an attempt to define privacy engineering. I think it's very interesting for a discussion and they define a broad range of jobs that might fall under privacy engineering. So that, for example, would include: software engineers building privacy tools; UX designers minimizing dark patterns, IT infra people like DevOps that could configure systems for better privacy; but even professionals handling physical architecture, like making privacy-friendly choices in restrooms, patient rooms, et cetera. It's very broadly defined. So, I think it depends on the organization, on how they define privacy engineering and what they want from those privacy engineers, skills-wise.

Engin Bozdag: 19:08

So, at Uber, our privacy engineers focus on first, privacy-threat analysis in a new engineering design architecture. We take a look, and we ask questions, and we recommend architectural changes so, privacy- by- design is actually implemented, and the outcome would be technical controls like onboarding to a deletion platform or reducing unneeded data in request headers. We also do technical audits to reduce technical privacy debt. So, these are controls that are missing in legacy systems. We support our engineers with ad hoc questions - how do you delete data from this particular database? We support our legal colleagues and give feedback on the policies. We also give specs and requirements to our software engineers in what to build and which new features do we need in existing tooling? So to summarize, we connect protocol engineering to legal and other security teams and ensure that engineers get practical privacy advice. That might be very different in another organization. Maybe in another organization privacy engineers are the software engineers building privacy tools. So, you should understand what the organization wants from a privacy engineer and not just focus on the title. That's advice I can give.

Debra J Farber: 21:09

Yeah, I think that makes sense. And if the episode that came out right before or that came out, let's see, two weeks ago, that would be this past Tuesday, present day, but two episodes ago, for anyone who's listening to this with George Ratcliffe from Stott and May Recruiting, he really goes into kind of what companies are looking for, based on their needs and from a perspective of a recruiter across multiple organizations. So, I encourage folks to take a look at that episode as well. But, Stefano, I would love to understand your perspective on privacy engineering and what it means to be a privacy engineer.

Stefano Bennati: 21:45

Yeah, I agree with what Engin said. So, privacy engineering is an incredibly broad discipline. I like the description of privacy engineering specialties by Lea Kissner and Lorrie Cranor, which was published in their paper, "Privacy Engineering Superheroes. Also, the title is pretty good. Yeah, it is. It resembles the IAPP classification.

Stefano Bennati: 22:06

There are a few more roles beyond what Engin mentioned. So, we have analysis, consulting, privacy products, math and theory, privacy policy, incident ,and vulnerability response. So, it's very broad and each specialty has a unique skill set. But I think that all privacy engineers must have a broad knowledge of the portfolio and also how competitors handle the same privacy issues in their similar products. In my day-to-day job, I am the point of contact for different stakeholders for a variety of privacy topics. This requires: deep technical knowledge to compile data-driven reports that support stakeholder making an informed decision; business knowledge to understand the perspective of each stakeholder; as well as communication skills to clearly explain the trade-offs to the stakeholders. I also want to stress that the key here is to work with developers and product managers as opposed to throwing them blockers and some impossible requirements from the top of our ivory tower.

Debra J Farber: 23:07

Thank you. So next, I want to understand how companies should think about data deletion capabilities, especially when it comes to integrating third-party tools. How can these tools be basically integrated into a company's architecture? Engin, let's start with you.

Engin Bozdag: 23:25

I'm going to get a little bit technical and I promise I'll keep it short. So the complexity arises when you have, again, a large company with a distributed architecture. In distributed architectures you have so-called 'microservices.' So, think of them as these small software blocks that do one thing very well but they rely on others for function. So, you kind of separate the business logic into different software blocks. So, you might have a microservice for payments, another microservice for user sign-up, et cetera, and these microservices will have their own database and they will store their own data.

Engin Bozdag: 23:59

So, going back to your question, when we get an erasure request, a right-to-eraser request, from a user and it enters a company, typically a dispatcher service will take it and they will forward it to these different microservices and expect them to do their own deletion. There are different ways you can get this done. You can use an asynchronous model where you put messages in a queue and wait for the services to consume it. Or, you can do an API integration so that dispatcher's API endpoint will be implemented and these services will be listening to these incoming messages. Both approaches have pros and cons. When a third-party data sharing gets into play, it gets more complicated. So again going back to my example on company selling T-shirts. Let's assume the company uses another vendor for actual delivery of the T-shirts and assume that a microservice at Company A is doing the integration and sending this data to Company B. So, who would actually send this erasure signal to the third-party?

Engin Bozdag: 25:07

You can have a centralized approach where one service handles all third-party integrations, or you can do different data owners integrating with different third parties. You should also think about auditing and logging off a request, because once data leaves your network, those external requests may be lost. Do we keep on retrying? H ow many times? What do we need to log? Do we need to log that the third-party received the request? Did they execute the request or did they just put it into their backlog? And, maybe we don't use APIs because the third-party doesn't have support? Are we comfortable in sending CSV files? So, as you can see, there are many engineering decisions to be made here. The legal requirements will just start a discussion, like forwarding requests to third-party, but that's only the start of the engineering and the architecture process.

Debra J Farber: 26:00

That's pretty enlightening. It just really shows you how much more complex the actioning upon the legal requirement is than the requirement itself. So curious if there's any attempts at standardizing requests. Or, is there an IEEE working group or something on third-party requests for data deletion, or is that a good opportunity for someone to create so that it standardizes these as best practices across organizations?

Engin Bozdag: 26:27

I'm not aware of any work on this field, but, as you said, this is definitely a good opportunity. I'm noticing many companies have, you know, using different approaches, they have different APIs, different requirements on APIs - so, this is definitely an area where we will need more standardization.

Debra J Farber: 26:45

Yeah, that would probably help with decision-making and design. Ok, so, Stefano, what are some good career paths or skill sets for becoming a good privacy engineer, and where do you suggest people learn more?

Stefano Bennati: 27:00

Yeah, good question. As mentioned before, the field is very broad and so there's always the opportunity to study a particular subfield (for example user experience) and then get involved into a privacy project later on. Once into the privacy space, I can suggest some general skills that can help any privacy engineer. Risk management is the top one for me. It is very important to understand risk and talk risk, because a black and white privacy thinking doesn't really help. Then communication - a privacy engineer needs to communicate clearly to stakeholders with different backgrounds. Learning the basics of other backgrounds, for example policy, business and so on, helps a lot understanding the point of view of the stakeholders and communicating more convincingly. Then we have statistics. It helps estimating privacy risks and talking about those, which sort of grounds your reasoning. Then, of course, software architecture - very useful to give realistic improvement suggestions to proper teams.

Debra J Farber: 27:58

Thank you, that's super helpful! What about you, Engin? What are some good career paths or skill sets for becoming a good privacy engineer and where can people learn more?

Engin Bozdag: 28:08

I think it depends on type of privacy engineer you want to be, but for our privacy engineers (at Uber) we typically have two paths. The first one, which I want to call 'Architecture to Compliance,' is we have folks with extensive experience in building software systems, and they typically gain more on the privacy compliance side. I sometimes hear from other privacy professionals that compliance knowledge is not necessary or should not be the main focus, because we should go beyond compliance anyway. But, if you do not grasp concepts such as secondary use, pseudonymization, lawful basis, your understanding of privacy threats will be limited. I met software engineers building privacy tools thinking you always need user consent or consent is sufficient to satisfy user privacy.

Engin Bozdag: 28:58

The second path, which I call the "ompliance to architecture part is when you have a seasoned privacy pro learning more on the software components or systems. You don't need to actually code, but you need to speak the same language as the engineers. The advice you give is granular and actionable. This means that it's not just doing a PIA with a software engineer and communicating policy requirements like delete data according to policy, but actually understand technical limitations, capabilities and maybe opportunities. When you see common patterns of issues across different products, you should be able to recommend different technical solution that will be more efficient for different teams. How do you gather those skills? Maybe join engineering discussions as a private SME and learn more about these software components, or you build software yourself to understand how these systems work in harmony.

Engin Bozdag: 29:57

The second path is more difficult because there is a lot of learning curve. Many privacy pros have given up due to this learning curve. But the first part, it does not deliver many privacy engineers. Often, software engineers don't want to narrow down their scope. They want to have a larger scope in software engineering. Typically, only those that are really passionate about privacy stick around.

Debra J Farber: 30:24

That's a really fascinating insight. Thank you for that. That really is because every company has its own way of doing things and it's nice to know that there is an opportunity for privacy pros that want to get more technical, like applied technical, and work with engineering teams that aren't necessarily coders. But, it's also fascinating to understand why it's harder to get privacy engineers pumped out if they're not coming out of a Carnegie Mellon Privacy Engineering school with a master's and a PhD in the topic. Like, where do you go find them, right? Why is it so hard to find them?

Debra J Farber: 30:57

I think you gave some good perspective on why they might not be drawn to a hyper-focused engineering path like privacy engineering. But I'm with you. I think it's freaking fascinating, obviously, and I just I can't get enough of talking about privacy and data protection, so I'm definitely in that second bucket. Let's broaden the discussion a little bit to something that's so hot right now: AI - especially for generative AI and LLMs (large language models). What should privacy engineers understand about vendor privacy requirements for LLMs if they're bringing that into an organization? Let's start with you, Stefano.

Stefano Bennati: 31:39

Yeah, I think even before starting to evaluate vendors, privacy engineers should verify under what circumstances or preconditions they can apply artificial intelligence to their use case. Some AI-related regulations and guidelines review are the EU AI Act and the Ethics Guidelines for Trustworthy AI. Once you verify that, then you can think about which vendors you want to choose. An approach that I take when evaluating vendors is to put myself in the shoes of their privacy engineers. I collect as much information as possible about their business model and product design, then think about how I would build privacy into their product. Given these constraints, could a privacy engineer justify to their management adding privacy-enhancing technologies and minimizing data collection, or does it have too big of an impact on their business needs? Another complementary approach is to ask the vendor to invite a privacy engineer to the product demonstration and evaluation meeting. That is a quick way to get a better understanding of the privacy stance of that product and also helps your fellow privacy engineer to prove to their management that privacy is an important feature of their product offering.

Debra J Farber: 32:48

I like that approach. I think that makes a lot of sense. What about you, Engin?

Engin Bozdag: 32:52

I think this is an interesting field that is moving very fast. I observe companies going with different paths. Some of them are using open source on- prem models, which reduces the risk of data breaches, but it requires quite a hefty investment in computing and personal resources. Also, I'm hearing the performance might not always be optimal. Commercial models provide much better performance and there are no front calls, but there are questions around their security posture. So, as a company, I think before using LLM, the first question, you should ask, "Do you really need LLM or generative AI capabilities for your purpose? Maybe you can use traditional machine learning for your use case. Maybe you want to be able to explain the decisions in detail. So, the first step is to understand will you really benefit from these LLMs and is it work the risk? Second, after deciding whether to use an on-prem or a commercial offering, you need to do diligence, thinking about security, environment of the vendor; but also, is the vendor storing your input and output for follow-up analysis? Are they using such data to retrain their own models?

Engin Bozdag: 34:09

There are serious concerns around hallucinations. For example, the model creating syntactically and semantically accurate output, but it's not factual. What about bias in the dataset of the vendors? It depends on what type of project you have. If you do a general market analysis, maybe you can't tolerate some inaccuracy; but, if you're trying to automate hiring and sourcing for example, you feed in a job description and LLM helps you finding the right candidates, you have much higher risk. And other things to consider is data deletion. If your data is used by the vendor, how will they delete the data? And deleting data from training dataset, is it sufficient for the model to forget that particularly user's data? There are all these considerations and many more you have to make, and you always have to be transparent to your end users if you decide to use one of those LLMs in your product offerings. So, probably you have a risk governance framework already in order, but you might need to update it based on these new threats.

Debra J Farber: 35:18

Yeah, yeah. And then, with all those questions, it'll be interesting to see what regulatory bodies like the FTC and others in the EU, what kind of potential fines and requirements they'll place on an organization. If they want you to delete the data set, they might actually not just make you delete the training data, but delete the model it was trained on. In the U. S., we call that "disgorgement. It'll be interesting to see what the FTC, what their stance is on companies that they feel unethically train their data with personal data or other risks. So we'll stay tuned.

Engin Bozdag: 35:52

To add, I think FTC has this interesting report on OpenAI. I recommend all the listeners to read. You can understand what they are expecting from a generative AI offering. It offers an interesting risk framework and that will be valuable to understand expectations.

Debra J Farber: 36:13

Great, I'll look that up and I'll try to put that in the show notes for other people to access. Ok, so what other trends are you seeing in the world of privacy engineering? Stefano?

Stefano Bennati: 36:23

My impression is that the industry is slowly reaching a more mature state where tools to minimize risky data are available and easy to use. Now, the focus of the privacy engineer is shifting from product-specific controls towards company-level controls which govern privacy risks across products and data types. For example, HR data, product data, and so on). Examples of privacy governance solutions are data and product catalogs, which collect privacy-related labels and information on how data is processed in the company. The most advanced solutions are able to dynamically update this information by monitoring changes in the data or in the source code. I also see this shift in the privacy vendors, with the market moving from early solutions focused on providing organizational algorithms for specific products or use cases towards company-wide governance of privacy risks. One example is Privado, which we use here to monitor the compliance of our products and data lineage.

Debra J Farber: 37:25

Awesome. Privado is also the sponsor of our show and I sit on their Advisory Board, so I'm a big fan. Engin, I think you also have a relationship with Privado that we want to disclose. And, why don't you answer also the question what trends are you seeing in the world of privacy engineering?

Engin Bozdag: 37:43

Yeah, I also provide a consulting service to Privado. Privado provides, or is working on, practical privacy solutions. I think vendors should listen to privacy practitioners to better understand what the industry needs. I noticed some tool offerings, but that's a problem we had a couple of years ago. We already have in-house deployments and there are other areas where we really need a vendor to step in but that's not addressed. So, typically some vendors see it as an undesired cost, but building something that doesn't provide much added value for practitioners will lead costly redesign. So, I'd recommend let's build solutions that fix our biggest problems first.

Debra J Farber: 38:29

And so, just so that we explain to people why we're even mentioning Privado, if you could just describe their tool and how that helps you with things like anonymization or location data, or how do you use it and why are you bringing it up?

Engin Bozdag: 38:43

We don't use Privado at Uber, but based on what I've seen on the tool, it can detect new assets being added to the inventory, new API endpoint being added. It also does code scanning, so some privacy threats in code can be identified and managed. What they're also working on, long- term, is something to help with the design reviews and understanding privacy risks that might come out early on and then tied back to the code parts where you can actually validate some of the requirements you set during the design review. I think that's very, very useful. Many companies do design reviews or PIAs, but that ends up as some tickets in a specific tool and validation becomes more difficult. Tying the code analysis back to original design requirements will be very, very useful.

Debra J Farber: 39:40

Yeah, that makes a lot of sense and really key to DevPrivOps as privacy is being added to the DevOps process. Awesome. Okay, so moving on, how do you guys deal with conflicts between engineering, legal, and operations teams? I know that you are very much interfaces between all of them, so, yeah, what's your approach to dealing with conflicts? Let's start with you, Engin.

Engin Bozdag: 40:05

I think when a conflict arises, typically it's because accountability is not very well defined. I'll give an example. In the early days of GDPR, I've seen an organization where the Legal team owned privacy, but they do not want to define operational or technical requirements for engineers because they didn't feel like they had the right expertise. But, those legal requirements did not work for the engineering teams because they needed something much more tangible and actionable, but they lacked the knowledge of privacy laws and regulations. So, they couldn't themselves create those requirements. And, things changed after they hired their first Privacy Architect. I won't tell who this person is. Y ou can make your guess. In an ideal setting, I think, the Legal team defines policies, works with the risk (accepted risk), and informs the rest of the company on new developments. Privacy engineers and more operational folks will help translate those requirements but also give feedback to the Legal team so that we update policies based on technical capabilities, but we also build new tools and features based on policy changes. So again, define accountability properly and the teams, I think, will work in harmony.

Debra J Farber: 41:25

That's great - a great perspective. What about you, Stefano?

Stefano Bennati: 41:31

I think conflicts arise because each of these teams brings a different perspective to the table. For example, Legal teams tend to have a better understanding about compliance risks than Product teams. Well, Product teams tend to have a better understanding of business requirements. The role of the Privacy Engineer is to evaluate these perspectives in an objective and data-driven way, then guide the stakeholders to reaching an agreement. Of great help for successfully dealing with conflicts, I find a strong privacy culture in the company. Like revenue and efficiency and others, privacy is not necessarily a recognized- broadly a broadly- recognized value, and so stakeholders are not necessarily ready to compromise for it. From my experience, the most efficient way to build a privacy culture is to educate upper management and have them advocating for privacy in their teams. Who's responsible for educating upper management is the Privacy Engineer.

Debra J Farber: 42:26

Yeah, so it's a lot of responsibility besides just engineering. I know a lot of responsibility falls on Privacy Engineers to get stuff done even at a higher level. Stefano: You never get bored. Debra: You never get bored! Okay, so what's the best way to get privacy issues fixed in an organization? Engin?

Engin Bozdag: 42:48

I think, because the challenge in getting privacy issues fixed is not on technical disagreements or choosing the most efficient solution. It's more about scaling. In large organizations, privacy engineers cannot fix everything or even have the bandwidth to track all privacy issues across the board. The first thing to do would be identifying key engineering leaders (those who have authority) and get them to nominate some 'Privacy Champions' to get those issues fixed. Typically, privacy champions will be senior engineers. They know the specific of their products' architecture. They know who covers what; who owns which technical feature; and they will also have some authority to get some resources to get things fixed. They will also send you a signal if they discover privacy vulnerabilities at the system level.

Engin Bozdag: 43:42

After securing resources and getting some champions, you should talk with the team, train them and together identify a plan to address high-risk issues, dependencies, and then have some estimated time to fix these issues. You won't be able to fix everything immediately, so think of privacy debt. Another thing to consider is being transparent in efforts to the higher ups. Show them what is pending, what is fixed and what are the blockers; and later, use these learnings and then train your engineers better so they actually address these issues early in design - so that you don't have to do this all the time. So, focus on privacy- by- design, not on privacy- by- remediation. This will be a continuous exercise.

Debra J Farber: 44:33

That's really, really great advice. Stefano, what is your perspective?

Stefano Bennati: 44:38

Yeah, it's very aligned. I really like the Privacy Champions approach, and I want to add that it doesn't have to be only engineers. It can also be Product Managers and other kind of figures. I say this because privacy issues are ideally resolved proactively, even before they surface, so even at the product development before development starts. So, privacy engineers need to be involved in product design from the start, while there is still room to adjust the product architecture, and before the requirements are finalized. So, in this context, the most effective approach is to addressing privacy concerns is to establish a comprehensive Privacy Engineering program and shift privacy left. So, as Engin said, privacy by design.

Debra J Farber: 45:16

And we all know that I'm for 'shifting privacy left,' even naming my show exactly that. So, we are definitely aligned. Okay, before we close, do you guys have any last words of wisdom for the audience as we close the conversation, starting with you, Stefano?

Stefano Bennati: 45:33

Yeah, I have a message for my fellow Privacy Engineers: do not overlook the less technical aspects of the work, as they constitute the foundation of privacy- by- design. Your objective should be to establish a robust privacy process deeply integrated into the product and development lifecycle. You need to collaborate closely with product and engineering teams, obtain the support through privacy education and incorporate their feedback into the process. This approach ensures that the process aligns with these teams rather than working against them. A strong privacy- by- design process also enables the early detection and mitigation of privacy concerns, starting from the conceptualization and design of the product. This facilitates the seamless integration of privacy and asset technologies into the product later on, when development starts. Moreover, product and engineering teams can actively become participants in the privacy- by- design process by identifying and reporting privacy issues themselves.

Debra J Farber: 46:28

Great advice. Great advice! Engin, what's your great advice?

Engin Bozdag: 46:34

I think it's simple. You should really have some passion for privacy in order to sustain in this field. The field itself is not 100% mature, such as security, and it is ambiguous. There is often no one right solution. People often ask me. "how do I become a Privacy Engineer? I just got my CIPT or my Masters in Cybersecurity. There are different paths, as we discussed, but again, the passion will help because there will be a lot of things to consume. There are new articles being published on the AI Act, generative AI architecture, such as transformer models, homomorphic encryption, generative AI governance.

Engin Bozdag: 47:12

You need to keep up- to- date with this material. That means you'll have to do a lot of reading, and to do such reading, you really need to love this field. I know people who've been here over a decade and they still ping me at night with a new article published on privacy threats. You need to have this passion. I've seen some folks who entered the field because it was a hot topic, there was a lot of hiring, but they left in a couple of years because learning just got too tiring for those folks. That learning should not be a burden for you. It should be excitement. If you like it, you should carry on. If you don't, you have to have a reflect on what you want to do. Maybe you want to stay in the privacy domain, but not as Privacy Engineer. Maybe you want to do something else, but passion and love for privacy is the key.

Debra J Farber: 47:57

Yeah, I think that makes a lot of sense. I obviously have the passion, but I think we're constantly drinking from a fire hose. Even though people, even though privacy engineers, might love their work, I do want to say that it can be exhausting to continue to drink from an ever changing fire hose of news and changing requirements and case law and just having to put it all together. I hear what you're saying. I also get a lot of satisfaction out of putting all those pieces together and then almost trying to solve, as if it's a puzzle. How do you get to the right outcome for an organization based on its business needs and then the needs of effective privacy and putting those controls in place? So, I think that that makes a lot of sense.

Debra J Farber: 48:39

I do think that people can get really burnt out if they don't want to continue to constantly learn new things. This was really, really a great conversation. I think that my first interview having two guests has been a success, so thank you for being my guinea pigs. I think this episode is going to be really well- received because it's a lot of meaty ,info from the front lines of being a Privacy Architect and a Privacy Engineer in actual job roles and not just talking about privacy engineering abstractly. I just want to thank you so much for joining us today on Shifting Privacy Left.

Stefano Bennati: 49:14

Thank you, Debra. It was a lot of fun.

Engin Bozdag: 49:16

Thank you, Debra.

Debra J Farber: 49:17

Excellent. Okay, Until next Tuesday, everyone, when we'll be back with engaging content and another great guest (or guests). Thanks for joining us this week on Shifting Privacy Left. Make sure to visit our website, shiftingprivacyleft. com, where you can subscribe to updates so you'll never miss a show. While you're at it, if you found this episode valuable, go ahead and share it with a friend. And, if you're an engineer who cares passionately about privacy, check out Privado: the developer-friendly privacy platform and sponsor of this show. To learn more, go to provado. ai. Be sure to tune in next Tuesday for a new episode. Bye for now.

People on this episode

Head Shot

Debra J Farber