Today, I welcome Gary LaFever, co-CEO & GC at Anonos; WEF Global Innovator; and a solutions-oriented futurist with a computer science and legal background. Gary has over 35 years of technical, legal and policy experience that enables him to approach issues from multiple perspectives. I last saw Gary when we shared the stage at a RegTech conference in London six years ago, and it was a pleasure to speak with him again to discuss how the Schrems II decision coupled with the increasing prevalence of data breaches and ransomware attacks have shifted privacy left from optional to mandatory, necessitating a "privacy left trust" approach.
Thank you to our sponsor, Privado, the developer-friendly privacy platform
Gary describes the 7 Universal Data Use Cases with relatable examples and how they are applicable across orgs and industries, regardless of jurisdiction. We then dive into what Gary is seeing in the market in regard to the use cases. He then reveals the 3 Main Data Use Obstacles to accomplishing these use cases and how to overcome them with "statutory pseudonymization" and "synthetic data."
In this conversation that evaluates how we can do business in a de-risked environment, we discuss why you can't approach privacy with just words - contracts, policies, and treaties; why it's essential to protect data in use; and how you can embed technical controls that move with data for protection that meets regulatory thresholds while "in use" to unlock additional data use cases. I.e., these effective controls equate to competitive advantage.
Copyright © 2022 - 2023 Principled LLC. All rights reserved.
Debra Farber 0:00
Hello, I am Debra J. Farber. Welcome to The Shifting Privacy Left Podcast, where we talk about embedding privacy by design and default into the engineering function to prevent privacy harms to humans, and to prevent dystopia. Each week we'll bring you unique discussions with global privacy technologists and innovators working at the bleeding-edge of privacy research and emerging technologies, standards, business models, and ecosystems.
Debra Farber 0:27
Today, I'm delighted to welcome my next guest, Gary LaFever, co-CEO and General Counsel at privacy tech company, Anonos and a World Economic Forum Global Innovator. Gary is a solutions-oriented futurist with a computer science and legal background, and over 35 years of technical, legal, and policy experience that enables him to approach issues from multiple perspectives.
Debra Farber 0:56
Gary LaFever 0:58
Debra, it's an honor and pleasure to be here.
Debra Farber 1:00
Oh, likewise. Gary and I have shared the stage about six years ago. I think that's the last time I saw you was in London at a regtech conference when I was at BigID and you were at Anonos and getting the word out.
Gary LaFever 1:14
Absolutely. We're still both evangelists for good cause.
Debra Farber 1:17
Indeed. So, it's great to be reconnected. To start, can you tell us a little bit about your journey and, you know, how you got interested in privacy and running a privacy tech company?
Gary LaFever 1:28
Sure, appreciate that. I started off in technology. My undergraduate degree was in computer science: management information systems; worked for a couple years with Accenture; and then practiced law for a decade with Hogan Lovells. And, since that time, I've been looking for ways in which the intersection of law and technology could help advance innovation and data utility. So, I don't come to privacy as a privacy-first practitioner; rather, it's a data utility practitioner, but I believe the way you deliver that in a sustainable way is with privacy and security. So, it's really the background in both tech and law that brought me to where I'm at. And that's why I'm so excited by the Anonos. We've been doing what we've been doing for 11 years now at Anonos. But, it's exciting because I think a lot of what this podcast is about is being validated, which is you can't approach privacy with just words. Right? Contracts, policies, and treaties only get you so far.
Debra Farber 2:29
Gary LaFever 2:30
And that's what Anonos is about. It's about a technical solution or solutions that satisfy the legal requirements for expanding data use in a sustainable way.
Debra Farber 2:42
Well, that's very compelling. So, what I'd like to do is understand, what does shifting privacy left mean to you?
Gary LaFever 2:49
Well, I love the term and I love the term "privacy by default," I love the term "privacy by design and by default," because those two words - "privacy," which connotes more of a legal concept and "design," obviously, which connotes more of an engineering concept. What I love about "privacy-by-design," when it first came out of the concept was exactly that - "shifting left" meaning earlier in the process. Right? You can't come to privacy after a product or after a service offering has been deployed. You need to have that early, early on. But I actually think, I think this has been missed by a lot of people: within the last two years, I think the Schrems II decision and the increasing prevalence of data breaches and ransomware attacks have shifted privacy left from optional to mandatory.
Debra Farber 3:41
And what do you mean by that?
Gary LaFever 3:43
Well, to me, the real ruling from the Schrems II decision by The Court of Justice of the European Union was not that new SCCs were required, but ironically, whatever you're doing, you need "technical" controls. Okay? And, if you look at the EDPB guidance, they talk about "organizational controls," "contractual controls," but there's a great paragraph in there where they say, when it comes to surveillance by third country governments, they're not bound by the contracts. They're not part of your organization. So, as important as the contraction organizational safeguards will be, you're gonna have to have technical controls. And so, I then kind of connect that dot to the data breach ransomware attacks. By definition, if it's an external data breach, that party is not bound by any contract and you didn't invite them to the party, right? They breached your perimeter, and they've gotten access to your data. So, they're not going to be beholden to any policy or contracts you're trying to hold them to.
Gary LaFever 4:51
And even the internal employees, right, it's amazing, depending on who you believe, you know, Gartner says 60% of breaches are internal and other actual providers of technology end point solutions, say it's 80%. And why I'm raising that is an employee who misuses your data is not going to feel compelled to abide by a policy or contract they signed. And so in both of these situations, the policies are critically important, but the technology that enforces them to ensure that they can't be overruled, or ignored, or what's critical. And to me, that's what "privacy left" is about; and, the importance is shifting more and more to the left, which is, yes, you need to know what your policies are. That's important. It's a very important first step. Okay. Technology does not solve these problems by itself, but nor do policies. And so, you have to have the appropriate technology to enforce the appropriate policy in a way that is, in essence, technologically guarantee auditable, quantifiable and verifiable. And to me, that's what "privacy left" means.
Debra Farber 6:02
I really agree with everything you just said; and, it's the same lens through which I'm seeing privacy shaped the market and kind of make changes and kind of force privacy left, really. Like, you no longer can just rely on on paper and policy and procedures. It's at a point now where all of those guarantees that you just talked about, that's part, like part of "privacy assurance," and you can't get that assurance unless you like lay down the by design, these foundational, the technology and the governance and the, you know, upfront before you ever collect the data, make these technological guarantees, basically, like what are you going to be promising people about their data. And then, you know, part of privacy engineering to make sure that your systems are repeatably...you know, people are expecting that the outputs you're getting are what people are expecting. Right? And so those guarantees, you need a way to prove them to regulators. You need a way to prove them to your customers at large, and it gives them, it gives people a sense of trust. And, without those guarantees, it's just, you know, there's a lack of transparency, understanding, is your data protected? So, I think...I basically am saying, I 100% agree with you.
Gary LaFever 7:14
If I can make a suggestion...?
Debra Farber 7:15
Gary LaFever 7:16
What you just said was poetry, and you said one word that when you said it, like, alarms went off my head?
Debra Farber 7:23
Gary LaFever 7:24
No, no, no. I should say fireworks. Fireworks went off. I think...and so if you agree, it may be a good response to what I just said, right? You use the word "trust," and to me, an interesting response to what I just said. It's just the shortened version, my version of what you just said is what trust is has changed over time. Right? When technology first digitized data, the trust was self-evident because it happened within an organization. But, as data flows have become pervasive, international in scope, you can't base trust just on contracts or policies or words. Right? So, it's the technological enforcement that enables that trust to flow. Because when you said trust, everything you said just kind of like was wrapped up in a package. And it...so it's just a suggestion, but to me, trust has to be updated to be technologically enforced, I guess, is what I'm saying.
Debra Farber 8:20
Yeah, that makes a lot of sense. And, you know, in a previous episode, I talked with Mark Lizar from The Transparency Lab was talking about the differences between "consent" and "permissions," and that most of what we see as, you know...we say is consent on the web is really about "system permissions," and not true consent that's based on trust - that you're going to do with my data what you say you're going to do. And so, you know, we basically kind of need to, you know...his perspective is you need to reform the web so that it's there are protocols that put us more in power. But, that's an interesting conversation. I'm just trying to pull back the...that the nuance between the words that we're using and the practicality really makes a difference, and that we have to be careful with the words we're choosing and what we're saying and promising to customers. And that also includes, you know, the guarantees, like you said. I totally agree. We need to have these controls enforced upfront before you're ever even like sharing the data or any of the seven use cases that I know you deal with other Anonos. And, I guess that's a good opportunity to dovetail into what some of those data use cases are.
Gary LaFever 9:28
Absolutely. So what's interesting to me is, I think, ironically, to have a full and fair discussion about data protection, you need to start at the beginning of the continuum and take yourself to the full end of the continuum. And so that what I mean by the beginning, there are data use cases where everybody would acknowledge the best form of data is unprotected, clear text. Right? And one that always jumps to my mind is when I go to the doctor, I don't want them to treat me with data. That's kind of who I am, right? I want the exact results. of the tests that I had, or whatever it happens to be. So the seven universal use cases that we talk about at Anonos, they pick up after that. But, I think it's important to talk about that, that there are some use cases where unprotected clear text. And when I say unprotected clear text, I don't mean you don't have access controls and other protections. I mean, the data is clear text, right? Anybody that sees it sees the full 100% accuracy and fidelity but also vulnerability of that data.
Gary LaFever 10:26
But, then you start moving from that, and you start to look at different types of use cases. And these, these actually reflect kind of a, a continuum or maturity curve. So, the first one is simply you want to do development, you want to do testing, and you either don't think it's appropriate to use actual production data, okay? Just because the people doing the dev and testing then see real customers names and data about them. Or, in certain vertical industries, it's illegal, it's unlawful. Right? And so, that's kind of the simplest step away from clear text. And so there, you want to make sure that the data that you use is both reflective of what happens in the real world, but also protected and doesn't over-reveal data. And so oftentimes, synthetic data is a great opportunity there to to provide that. So, that's the first use case; and, I'm gonna stop on each of these just to see Debra, if you have a particular experience or use case or perspective on that. So, any thoughts on your part as we go from unprotected clear text, and now I want to make use of data, I wanted to give me results that are usable, but I don't think I need to use clear text. That's the first use case I would identify. Any thoughts?
Debra Farber 11:36
The only thoughts I have on that are just kind of my...you know, I remember when this area of the technology had been new for data masking back when I was the Chief Privacy Officer of a health care company back in, like 2009. And so, I just really remember that we were...I was pushing for data masking, and even during testing. You know, didn't have to be completely accurate, just needed to be similar enough to be able to test the application. So, I have a lot of experience by pulling back from so many years ago. But that's just more of an aside, as opposed to something that's helpful for this audience. Just glad to see that this has come a long way over the years.
Gary LaFever 12:14
Absolutely. And my guess is the audience is going to find your asides more interesting to what I'm saying. So, keep them coming.
Debra Farber 12:21
I'll do my best.
Gary LaFever 12:23
So the next one, and again, from our perspective, they each kind of up this maturity curve or sophistication curve. Now we're talking about internal data sharing, okay, across business boundaries or units. And this varies depending on the organization, on the industry. I actually heard recently that a major pharmaceutical firm, before they will allow sharing across departments, and in this instance, it's related to different clinical trials, you have to be able to show them that the potential revenue and profits from the internal data sharing exceeds 4% of total gross turnover. Obviously, what they're doing is they're saying, "Look, if we do this wrong, we could have penalties of up to 4% under the GDPR." But, that just goes to show that without the right privacy by design and default tools that enforce trust, look at the lost innovation there because even when it comes to data sharing internally within an organization, they've set the bar so high because they're so concerned about the downside of liability.
Debra Farber 13:29
You know, I'm actually surprised to even hear that you know of a company that has responded to you and let you know that they kind of put as a line item protections for privacy snafus. Right? That they understand what that cost is and that they're still willing to go in and maybe potentially get it wrong as long as it's worth more than the cost of fighting it in court. You know? I assume that about a lot of the surveillance capitalism enabling companies out there, but you know, to have some evidence of that is very fascinating. At least they're balancing it with not necessarily like harm to the customer, but it looks like they're balancing it with harm to shareholders here with that assessment.
Gary LaFever 14:10
That's a great perspective, but I think it also goes to show a lot of people don't realize the added advantages, okay, of privacy left because with a privacy left attitude, you should evaluate whether newer technologies might reduce that liability disclosure in such a way that the benefits may come in play without having to justify 4%. Right? So it's, I just....
Debra Farber 14:34
Gary LaFever 14:35
I really think that's a key part of it. And so that's the second use case. The third is when you start to use data to build models. Right? And so, you know, whether it's simple...I shouldn't say simple analytics. Whether it's analytics, AI, which is getting a lot of attention, machine learning...You know, building those models, oftentimes you don't have consent. Right? And as we learned from the recent Meta ruling, contract is not going to get you there, right? And so, what kind of data can you use to build models? Well, again, oftentimes, you need to ask yourself, "What do I need?" and many times synthetic data will get you there. So, that third use case that we started with application development and testing. Then we went to internal data sharing. Now I'm building and I'm emphasizing building a model. And so that's the third one. Okay? And there, oftentimes, our clients find that synthetic data is the best and safest thing to use, but as we go from three to four, which I'm not going to jump to yet, we shift where synthetic data, some of its strengths and weaknesses come to the forefront. So again, number three, is AI, ML and analytic model building.
Debra Farber 15:42
So hot right now.
Gary LaFever 15:44
Yeah, so hot.
Debra Farber 15:45
I have to say with that one. I'm still diving into the AI machine learning analytics space, especially with the explosion of large learning models. Large language models or large learning models? LLMs. And that's actually helpful to understand. I hadn't really thought about...I've been thinking about what are the privacy parameters you want to put on your inputs to machine learning training of models, but, you know, haven't really found the answers. I've just kind of thought about the question.
Gary LaFever 16:14
And look, I think having the questions is a prerequisite to finding the answers. Right. It's the great place to start; and, I think the transition from three to four is one that a lot of people don't focus on, right, because the tools that I need, and the rights that I need to have, in the protections that I need when I'm building a model are actually dramatically different from when I'm actually running that model because I built the model to run real data through. Okay? Whether it's real consumer data, whether it's real time transactional data, whether it's observed data, but the whole reason I'm building those models, so so I could run real data through those models, and start to infer or predict what people will do or what I should do. And, at that point in time, now you're talking real data. So you're gonna have to have a "lawful basis." And that I think we've learned just recently, consent and contract are likely not going to get you there, because you're likely not going to be able to describe in advance with adequate specificity that at least the GDPR requirements for consent would be satisfied and some of the newer state requirements for consent. And contract, it's really attenuated to say that this type of use of data is critical for the principled reason for entering into the contract. This, I think, starts to raise the importance of what I would call embedded trust through legitimate interest processing, which requires you actually have privacy left. You have controls in place that diminish the risk to the user or of the data subject. So, you can justify your use case. And so this shift from three to four, which is model building to model deployment, really starts to highlight starts to get into lawful basis of processing, the absolute criticality of privacy left data protection by design and by default.
Debra Farber 18:03
I think inferences and predictions are huge. I've been doing a lot of work in the XR space, especially with The XR Safety Initiative. And, one of the biggest, like we're creating a framework there, The Privacy and Safety Framework for XR, and we have a version one out already. But we've iterated a lot on a version two that's yet to be published. And one of the biggest things that I keep hearing from the XR experts is that so much personal information is being inferred. Biometrically inferred data about your habits, inferences about why your pupils are dilating, you know, things along those lines. So, those predictions and inferences can be influential or harm you if they're negative or if they're wrong, or if you just didn't want someone to make the inference and get the connection. So, really important use case to put controls around.
Gary LaFever 18:54
Absolutely. And then that kind of takes you to the fifth one, which is data sharing with third party service providers. And this takes a lot of different flavors, everything from storing and processing my data in the cloud, right, because by definition, a cloud is not my data center. It's somebody else's data center. And / or outsourcing certain aspects, whether it's, you know, AI development, or whatever it happens to be or even if you're working with an analyst firm, and you're looking for their assistance. In any of these situations, you're sharing data with a third party service provider. And now you have to have even more protections on the data itself because you're expanding the pool of people with access to the data beyond your own organization. And so did you did you have the right to do that? Did you get consent or contract and so once again, when you go from four, which is yourself processing data to inferences and predictions to five, where you're either storing processing or using the capabilities of a third party, that in and of itself starts to increase the need for the data protection by design, by default, privacy left type engineering.
Debra Farber 20:01
Absolutely, and that brings to mind when I hear a third-party service provider, immediately what comes to mind is cloud; and, I'm gonna throw a question back at you about what your thoughts are on The Shared Responsibilities Models that the cloud providers use and whether or not it adequately addresses privacy and what you would change?
Gary LaFever 20:20
So, absolutely. I think what would make it adequate is more transparency. And I'm saying that somewhat tongue in cheek because the reality is, I don't think everyone appreciates that all of the cloud providers say that they will ensure the security of the cloud, but it is the customer's responsibility to ensure the security of the data in the cloud.
Debra Farber 20:44
Gary LaFever 20:45
And this was actually something that the EDPB and the recent enforcement action, collective enforcement action - just came out last week - they really highlighted. They said, "Too many people just assume the cloud providers taking care of everything." And the cloud providers have not said they will are do. What I took away from that EDPB collective action was threefold. 1) Don't rely on the cloud provider to solve all the problems because they never said they would. Right? They said, "Here are my responsibilities. You have some as well." 2) Secondly, keeping the data in the EU does not solve all your issues. You still have a responsibility as the customer to protect the data when in use. 3) And thirdly, commitments by any hyperscaler to either resist requests for data, and or to inform you of those requests, are always caveat it with "as allowed under law." And the reality is, whether you're looking at the Executive Order, or The Cloud Act, U.S. operators are not permitted to tell you. And so, the bottom line of those three, when you wrap them all together, is it's up to the customer to ensure that they protect the data in use. So, if there's a government production request, the version of the data that's produced is de-risked, and not revealing, and yet it still provides value. And so, I believe that The Shared Responsibilities Model can work, but it requires further commitments on the part of the customer, and or transparency on the part of the cloud provider as to what is their responsibility and what is the customers.
Debra Farber 22:20
I agree; and, I think it'd be helpful to understand the flow of data; if you were to put personal data into those cloud services what the controls are. How do I know something's deleted? When I press this? Is it, you know, is it marked for deletion? Is it in garbage collector? Is it completely deleted from the machine? Is it still in backup? You know, all of that transparency isn't thought about from a privacy-by-design for product kind of perspective, but there's sure a hell of a lot of the security by design. Right? There's so much documentation about security. And then, privacy is like we have availability zones and encryption. You know,
Gary LaFever 22:56
Right. No, I agree.
Debra Farber 22:56
I think there's a lot more cloud providers can do and they can add privacy by design requirements to it and responsibilities, specifically. But, thank you for your thoughts. I agree with your perspective there. So on to the sixth data use case.
Gary LaFever 23:09
So, this is where things start to get really dicey. Right? Sharing data with third parties for monetization. Okay, and some people wince when use that word, right? But there are ways if you take the time, that data can be used to affect enterprise goals and objectives. And by enterprise, I want to be clear, some of those are merely commercial making money, but other ones are innovation and coming up with medical breakthroughs. There are privacy left approaches that can enable you to do that, but it's still respectful. And the word I like to use is sustainable. If you're making money today, but in a way that violates the rights of individuals, you're not going to be able to do that forever, it may take a couple of years to catch up, as we've seen in the courts. But the bottom line is you're either gonna get pushback from your customer base, or you're gonna get pushback from the regulators, or the real influence, I think, is from your partners, because they're exposed to liability when you do things wrong, too. And so I think the sixth one is almost the top of the pyramid here as it were the continuum, which is when you want to monetize your data by sharing it with third parties, you absolutely need to ensure that you have protections in place, or otherwise the monetization is going to be short lived.
Debra Farber 24:25
That makes a lot of sense. Yeah, I mean, that's the part where it's in the marketplace, the consumer, the customer or the human can see what they're receiving, what they're paying for what a company is taking from them in exchange for service or product or whatnot. And so yeah, if you could lose trust real quickly, there.
Gary LaFever 24:44
Absolutely. Absolutely, as opposed to, as we discussed earlier, if you have tools and capabilities in place that actually are measurable, quantifiable, verifiable, auditable, that's now what I would call "privacy left trust." Right? It's not just mere words or policies or even treaties. You have implemented technologies to ensure that the policies and use cases are, in fact, enforced.
Debra Farber 25:12
I like that term: "privacy left trust."
Gary LaFever 25:15
Debra Farber 25:17
Well, "shifting privacy left" is kind of the moniker I've been using; but yeah, I mean, I like that as a product category, then, you know. Like, what achieves...a privacy problem to...a solution is like, you've got "privacy left trust." I like that. I'm gonna noodle on that and kind of see how can we define that and what that would be. But yeah, maybe you and I can offline noodle on defining the term. I like it.
Gary LaFever 25:45
And what I like better, I've heard some people use the concept, or the term recently, "zero trust privacy."
Debra Farber 25:52
Gary LaFever 25:53
And they're trying to get to the same point, right, which is, I shouldn't have to trust you to know my privacy is being enforced and protected and embraced.
Debra Farber 26:05
You have to trust the technology to work.
Gary LaFever 26:08
Debra Farber 26:09
You don't have to trust the person using it.
Gary LaFever 26:10
Yes. And I think "privacy left trust," which I've literally just made up anyways, I think is a better way to say it than zero trust because it's not zero trust. It's just you're technologically enabling and ensuring and making the trust verifiable.
Debra Farber 26:25
Yeah, I think there's been so much marketization of these like, I don't know if marketization is the word. There's so many marketers that have been kind of conflating all these terms. And I think
Gary LaFever 26:37
Debra Farber 26:37
It's a little challenging for people to unpack some of them. So with trust, you know, you could trust the system, you could trust a person, you could...I mean, trust of the what, right. And so without that context, you're kind of like, throwing a word around that means nothing without context. So yeah, it can get confusing, but I think we need to define things a little better. And so, I like that with security, they use the term "zero trust." And that seems to me like a security guarantee, you could trust the system without trusting the parties, which is very similar to how...I used to work at Visa. And Visa came about as an organization of global banks that did not trust each other, but needed to be able to send money on behalf of their clients to other banks. And they needed to have a standardized way of doing this. And so they came together and created Visa, which used to be, I believe, a nonprofit before they went public and IPO at a long time ago. But that's how they came together because they needed to not trust one another, but trust, the technological system that they created.
Gary LaFever 27:40
That's a great segue to what I think is one of the most advanced use cases. And it's the seventh and last in this continuum, right, which is using data for data enrichment. And what I mean by that is, if you have sufficient trust in the system, to arch-competitors - as you just mentioned, the banks that formed Visa, but let's use Coke and Pepsi, right? There are certain things that Coke and Pepsi, two soft drink manufacturers...complete competitors would probably be willing to share with one another if they knew that that's all that was disclosed. And so to me, that is the penultimate of the test for privacy left is are the controls so secure, so verifiable, so quantifiable that even arch competitors would feel comfortable exchanging information, and then add on top of that, the consumers whose data it relates to, would not feel that they had been...their rights had been breached or violated.
Gary LaFever 28:41
And so, the last and highest one is, do you actually have privacy left controls in place that you could share data with parties, and then augment your own information, enrich your own information. So you know, just as an example, you might have a mobile phone company who's talking with a professional sports team. Okay? Whether it's soccer, football, baseball, doesn't matter. They actually may want to find things out about how they have common customers, or how things are done in certain aspects of what's bought at the stadium or whatever it happens to be. There are ways to exchange the information that's most privacy respectful and would actually make each of the sports team and the vendor comfortable if it was done with competitors. So to me, that's the ultimate, then you put controls in place that are verifiable, quantifiable provable, to the point where even competitors would be willing to exchange data. But obviously, there'd be more non competitors also willing to do it. And so those are really the seven use cases. And we think it's important to be able to meet customers wherever they are in that maturation curve, and give them what they need to derive the intended and desired business results.
Debra Farber 29:52
That's really cool. I mean, that's kind of...the meet people where they are is kind of my approach generally in life, especially in privacy. You know, like privacy can mean different things to different people. So when they, when they ask a question, the way they frame the question kind of indicates their level of knowledge about a topic. And so I'm able to meet them, if they want to talk it from a technical level, they want to talk at a legal level, they want to talk...you know, so I kind of love that, because that's how I operate is, "What do you need, I've got a solution, or I've got an opinion based on where you're coming in from or what your knowledge base is."
Debra Farber 30:25
So I guess my question, next question is, what are you seeing in the market? Like, what are people gravitating to the most? Which use case is their most awareness for? And then what what is growing? Like, are we seeing...it almost seems like did the data enrichment aspect might be a more mature company that's already done some...addressed some of the other areas; so, what are you seeing basically?
Gary LaFever 30:48
Yes. So, one of the ways we came up with this continuum was actually by looking at what our customers are doing, and what we see in the marketplace and our discussions with prospects. And, and I think, generally, there's a much higher use of the use cases lower in the numbering scheme than up top. Right. So, there's a lot of development and testing; almost everybody's into that. Internal data sharing, typically, it's happening regardless of whether it has controls in place or not. And then you start to get, I think, fewer companies, but it's increasing as you go up this use case continuum; but I think the exciting thing, to me at least because my orientation is value-first through privacy; and, the return on investment to an organization, I believe increases dramatically as you go up these use cases. So if you are asking which ones I think are the most prevalent, I'd say it's one, two and three: dev testing, internal data sharing, model building. But, when you start to get ROI positive that can actually contribute to your bottom line, it's when you're actually using those models, right? When you're starting to share data with third parties. When you're looking to monetize it, by definition, that's an ROI. Right? And I think, again, the highest value is when you're actually enriching and augmenting around datasets. So I think there's more on the lower end. But I think there's a gravitation to the higher end. And the reason for that is because there's positive economic return as opposed to just minimizing downside of doing things wrong.
Debra Farber 32:13
I absolutely agree. It's taking, it's reframing the issue. I'm trying to do this for my entire career, but it takes...it's a slow ship to turn around. Right? And I can't do it by myself; it takes a village. But basically, you want companies to realize this privacy is not...you can't add it on at the end. It's not just about governance, risk, and compliance or looked at to be a non-revenue generating, sunk cost. If anything, if you build it in by design, it increases revenue generation. I remember several years ago, when Cisco first starting putting out on Data Privacy Day research around privacy that they've done, when Michelle Dennedy was still CPO there. They had come out with this really great research - finally, some empirical data that was a showing that if you mature your privacy program, even just one level of maturity, that your sales process was much faster and you got your revenue much quicker and got more of it. That was just a few data points. It's not like we have as many as security data points; but, it really behooves a company to get things right up front so that they can then use it to build, not just trust, but to prevent technical debt, to prevent sprawl of data that you need to discover later, to de-risk your data, and to get rid of the potential privacy harms to your customers. So, I mean, there's so many benefits to shifting left, and I think it's great that you have these use cases that kind of map to that aim.
Gary LaFever 33:48
Well, let me jump on a term you just use "de-risking data," because I think de-risking is the right approach because what you mean by de-risking, or at least what I think you mean, and what I mean when I use the term is de-risking to enable greater data use. Right? So, it's not just about locking data down, and privacy cannot and should not just be about locking data down because you might as well just shut the doors and shut the company down, right. But you do want to respect the rights of your customers, end users, partners, regulators, right? And de-risking, I think, is the right way to look at this. And I actually will make a very bold statement and I say this to clients all the time. If they're willing to embrace a privacy left orientation to de-risking data, which means re-evaluating opportunities - cost, benefits, liabilities, all that - that I believe they can increase their revenues, not a percentage; a magnitude, a multiple. But, it requires a different mindset, and it requires a coordination between otherwise what would be your IT department, your security group, your privacy group, your governance group. Right? If you if you take all those people put them in a room and say, "Look, we need to do business. Here's why we want to do business. So, let's really evaluate how we can do business in a de-risk environment." And I think the data stores that exist are only being used for a fraction of what they could be with the right attitude.
Debra Farber 35:19
Yeah, I think that makes a lot of sense, and so I agree with you that that's what derisking means. But I do want to put another perspective on it. I think so often, when it comes to working in privacy within a company, there's this perspective of risk, and typically, you're looking at enterprise risk, or you're looking at risk to the business - if we do this, we'll get a fine, you know, as opposed to risk-modeling for preventing privacy harm to individuals. So, I think a combination of both is needed. Obviously, I'm not saying you were saying otherwise. But I use the term de-risk there, because it was a benefit to the company and not a benefit to the consumer and I'm trying to be more mindful of using different terminology for those separate stakeholders.
Gary LaFever 36:07
And I think it's imperative to recognize it means both. Right? And so...
Debra Farber 36:12
Gary LaFever 36:13
That's really...you have to have to stakeholder input because certain people are tasked with limiting or minimizing or perhaps even eradicating risk to the organization, and those are critically important they need to be they need to have a voice. At the same time, though, if that's driving every decision by the company, you're not serving even your consumer base, and certainly not your stakeholder or shareholder base. And so, it absolutely has to have all those people at the table.
Debra Farber 36:40
Totally agree. So what are some common barriers to accomplishing these seven universal use cases? Like what are organizations facing right now when they come to you?
Gary LaFever 36:52
So, the biggest one, to me, is a failure to fully appreciate the need for data to flow as you move from what I call, you know, one to seven or left to right. Okay, the earlier ones can actually occur, not exclusively, but primarily with access controls and security controls and trying to prevent the bad people from coming in. But, as you talk about these different use cases, by definition, they start to involve people outside of your organization. And that requires the data to be protected during computation when actually in use; and, it's shocking to me how many people still don't embrace that. I mean, I would still guess that 70% to 80% of privacy lawyers still advise their clients they should anonymize their data to stay outside of the GDPR. But the reality is, the second data flows outside of your organization and someone else gets access to it, and they can augment it with third-party datasets, it's probably not anonymous. And so, it is misleading everyone to claim that as the protection as opposed to putting controls in place. So the answer is, I think you have to protect data during operation, during compute, not just perimeter controls and then process clear text.
Debra Farber 38:12
That makes a lot of sense. And so what is your approach to overcoming the main obstacles? There's three of them that I saw on your website. I'm going to state what the obstacle is and I'd love to hear your approach to how organizations can overcome them to the these use cases.
Gary LaFever 38:30
Debra Farber 38:32
So, the first obstacle you have is like protecting data while in use, which is what you've referenced. And then your approach is something called "statutory pseudonymization." So, can you unpack that for us? Like, what is that? Tell us more, basically.
Gary LaFever 38:48
Yes. So, the point here is there are a lot of technologies that protect data when in use, but at very high cost. Okay. So, homomorphic encryption protects data when in use, but incredibly high, both compute resources, requirements as well as time elapsed time. Right? I mean, depending on who you listen to, a result with homomorphic encryption could take anywhere from 40 to a couple of 100 times longer to derive than with clear text. And so while that's protecting data when in use, it's not giving utility and fidelity and timeliness the customer. Or, differential privacy, which protects data when in use, but by definition is inserting noise and therefore the accuracy goes down. The power of what we refer to as statutory pseudonymisation is it actually can enable you to accomplish protection in use without any increased processing requirements, without any delays, and without any inaccuracies inserted.
Gary LaFever 39:40
So what is statutory pseudonymisation? It was defined initially, and by the way, if anyone's interested, if they go to pseudonymisation.com forward slash technical controls, there was a peer reviewed law journal article that just came out last week. It's called "Technical Controls that Protect Data When in Use and Prevent Misuse." It's The Journal of Data Protection of Privacy, and so there's a ton of wonky details in there if they're interested. But, the bottom line is the GDPR defined for the first time under EU law "pseudonymisation." And, there's some subtleties that are missed by people that are actually not so small in implication. It allows you to functionally separate information value, the reason for the processing, from identity. And, it's actually a requirement that you not be able to reconnect those two, without access to additional information that's held separately by the data controller, or by their designee. And why this just sounds like text. And a lot of people read and say, yeah, yeah, I know what to pseudonymisation means, it's actually a heightened requirement that a lot of people don't realize requires greater controls and higher levels of data science and sophistication.
Gary LaFever 40:51
That's why the EDPB and their recommendation for Schrems II compliance actually identifies pseudonymization as use case two; it's still lawful even in the cloud. So, it's a heightened requirement that is not immediately evident that actually gives you statutorily-mandated expanded data use rights. But, the exciting thing is it didn't stop with EU GDPR. Right? Obviously, it's in the UK GDPR, but South Korea actually mandates as part a part of their privacy law, if you're going to share data, you have to pseudonymization it. And now, five U.S. states, Colorado, Utah, Connecticut, California, and there's a fifth one that I'm blanking on - they all have replicated the exact definition of pseudonymization from the GDPR; and so, it's not what most people think and it produces amazing benefits, both from a protection and utility perspective. And so I think it's worth people taking a look at and realizing that maybe pseudonymization is not what they used to think it was; and that's why I put the ridiculous term "statutory" in front of it. Because it's not just general pseudonymization, it actually has heightened standards, which basically come to are you protecting all data elements not just direct? Are you doing it in a way that you can prevent singling out? Are you when required and appropriate, not using just algorithmic generated pseudonyms, which can be reversed engineered with quantum computing or whatever? Are you using dynamism; do you use different tokens at different times for different purposes? And lastly, do you actually make it and can you prove that there's controlled relinkability that only the data controller designee uses it? So, it's not what most people think, but it's incredibly powerful. It's worth getting up to speed on.
Debra Farber 42:35
Wow, thanks for that. Yeah, I did not realize that. I'm gonna go look at that. I like how thorough that is, and honestly, if it meets the regulation, it seems like you should probably, you know, pseudonymize, because it's going to, you know, de-risk the organization as well as add a privacy-preserving element.
Gary LaFever 42:57
Yeah, so it's a cocktail story.
Debra Farber 43:00
Yep. Go for it.
Gary LaFever 43:01
Google or search, okay, or whatever term you want to use. How many times anonymization or anonymize occurs in the GDPR. Encrypt or encryption appears in the GDPR and pseudonymisation and you'll end up with two for anonymization or anonymized. Three, I believe for encrypt and 15 for pseudonymise. It was intentional, that you actually get mandated statutory expanded data use rights if you achieve this level of protection.
Debra Farber 43:30
Wow, that makes sense. Also, I mean, I could go on a whole side thing about anonymity and how it's typically use for free speech and not for privacy purposes. And then if you go for like, if you're dealing with anonymity, like a state of anonymity, like someone wants chooses anonymity, you know, they're typically...if all you're building for your product is for someone to have anonymity, you're not going to go through the nuances of choice and specifics and whatnot and go into the level of detail of granular sharing. So it's just like a cop out to me to make things just anonymous and not allow for other ways to share data. So it kind of makes sense that pseudonymization...if everything was anonymous, the company is also restricted from being able to really do much it frees up the data use to pseudonymise, rather than anonymize. So, it almost seems like in addition to the extra rights that the GDPR is making sure is protected of individuals. They're also cognizant of the balance with, you know, you have a business, it's probably better to pseudonymization to achieve the privacy aim and still use your data. If they only had anonymization in there, it would have been really restrictive.
Gary LaFever 44:40
I totally agree. And I think it's recital 4 or whatever it is when...and this is not talked about enough. The GDPR was, at least initially, intended to both encourage and incentivize data flows and innovation and value while protecting individual rights. And, I agree with you, I think anonymization is a cop out that really doesn't achieve either.
Debra Farber 45:02
Well, excellent. Thank you for agreeing with me. No, I'm glad you're seeing the same thing. I would love to hear if you were seeing something different, of course, but it's good to good to know that you're seeing the same trend. Okay, so the second obstacle is the need to remediate sparse or biased data. How can companies overcome that obstacle?
Gary LaFever 45:23
Yeah, and I think this is critical with the new AI axe coming out in the attention to this, the reality is, I'm gonna get wonky for just a second, right? There's really two types of data, and I'm oversimplifying here, but you either have "numerical data," which reflects a number, and that number is, you know, represents something very specific; or, you have "categorical data, "which groups data into groups or categories. And so, one of the ways that you can help to overcome sparse or biased datasets is to either...when a computer processes data, it doesn't read English, French, German, Spanish or any other dialect, right? It converts it to ones and zeros. And so, if you can convert the data, transform it, if it's categorical, so let's just say male and female, okay, or religious or background, or sexual, or whatever it is, that's a category. You can replace the categories that are identifying in their own right with other non-identifying categories. The machine learning model will still process with 100% accuracy. But anyone looking at the results, unless they have permission to know what those categories really mean, will not know what they are. So, the first thing you do there is you help with regard to bias data. Okay? Discrimination - you actually can enable data scientists to create models and they don't even know what the different categories are, they're just, they're experimenting with the impact of one category in another.
Gary LaFever 46:54
So the power of statutory pseudonymization, it's a twostep protection and data utilization process. So the first step can be done by people who don't have access to the underlying identifying information. So therefore, it's impossible to discriminate. And then also synthetic data is very relevant here, which is, sometimes you want to come up with inferences or discoveries, and you just don't have enough data. And so what you can do, if you do it correctly, is you can actually augment your real data with synthetic data that still has the appropriate inferences and interrelationships. And so, this is getting kind of wonky, and data science like, but the reality is discrimination that's built into AI models, we won't discover for decades, maybe even generations. And it's worth the time to do the pre-engineering correctly and a way to remediate some of these risks, either because the data is too sparse and you want to have a more representative data set on which to do calculation or you're concerned that people intentionally or unintentionally could read the data and impute or effect bias intentionally or unintentionally.
Debra Farber 48:02
That's fascinating. I hadn't ever thought of the bias data problem being solved with, you know, statutory pseudonymisation or synthetic data. But that makes, it makes a lot of sense. I hadn't put those two concepts together,
Gary LaFever 48:17
The thing that surprises a lot of people is statutorily pseudonymized data is fully capable of being processed and advanced machine learning. And so, it's not just a...a lot of people look at pseudonymisation as a security technique. Right? It's actually amazing. In fact, it's mentioned, Article 25, when they talk about data protection by design and by default, the one thing is mentioned this pseudonymization. So, it's intentional, but just the worst PR blitz of all time that it's not a better known fact.
Debra Farber 48:50
Yeah. Because to me, that's a privacy guarantee, right? That's something we should be aspiring to, that any privacy engineering program should have as a mandate. So, in order to achieve the goal of being able to truly say that it's pseudonymized. Right? So yeah, that's fascinating. I'm definitely gonna dive in more on that. I'm gonna read some articles that you've written. I'm gonna read that, what is it the data protection, Journal of Data Protection?
Gary LaFever 49:16
Yeah, it's The Journal of Data Protection and Privacy. And again, I was one of four co-authors on the article, but it was a peer-reviewed, etc. And it's available - that is a paid-for-journal, which is well worth the cost, but we negotiated for distribution rights. So, it is available at pseudonymization.com forward slash technical controls. You can download it without having to pay for it, but I would highly encourage - the journal is very, very well reviewed, and you can see the editors involved. It's a great resource.
Debra Farber 49:47
Excellent. Okay. And then, gosh, we're, there's so much to talk about that I can't believe how much we've flown by and I have so many other questions teed up, so I'm gonna shortcut it to some of them. First, let's get through obstacle three, which is satisfying the legal requirements for international data transfer. So, a lot of companies need to be able to, you know, do the adhere to GDPR is onward transfer requirements to a lawful country or under certain circumstances. And so what's your approach basically, for?
Gary LaFever 50:16
Yes, so I will shortcut this by saying it's the book end solution. If you look at the EDPB final guidance on on Schrems II and you look at use case, one, two, and three, one is storage of data in encrypted form in the cloud. Right? Three is transfer of data and encrypted, but what's between one and three is actually processing of statutorily pseudonymized data. And, again, it shocks people. You can accomplish many, not all but many advanced analytics, machine learning AI models using statutory pseudonymisation, and in those instances where you need identity, because you have adopted these privacy left controls, whenever wherever and as often as possible, you are much better positioned to satisfy 49-1 derogation for actually transferring identifying data. So, that would be what I would call "the bookend approach." You combine use cases, one, two, and three that the EDPB have suggested, and when you need identifying data, you're in a much better position to actually transfer identifying data through a 49-1 derogation?
Debra Farber 51:25
Excellent. Well, thank you so much for that. I see that Anonos's software, Data Embassy centers around "variant twins," non-identifiable, yet 100% accurate variations of the source data that's required for specific use cases. Can you tell us more about this approach?
Gary LaFever 51:43
Yeah, so I think we all know that digital twins are a digital representation of a person, place or thing. What a variant twin is, is a use case specific variation of the digital twin that it only discloses those elements of data needed for that use case. So it's literally technologically enforced purpose limitation, data minimization, data protection by design and by default. And, the beauty is with variant twins, data is no longer the new oil because oil has one-time use. Once you combust the oil, it no longer exists. Rather...
Debra Farber 52:20
Gary LaFever 52:22
Yeah, data becomes the new water, which when used correctly, replenishes itself. So variant twins can be used in a way that is sustainable respects the rights of the individuals with regard to whom the data relates, it's also replenishable. You can use it with different partners at different times. And that's why I say someone who takes a de-risking privacy left approach to their data can increase their revenue several fold. It's because variant twins, which is our way of delivering this to the market are replenishable, inexhaustible, sustainable, and privacy-respectful.
Debra Farber 52:57
That's really exciting technology. You know, thank you so much for that. Is there anything you want to plug besides the paper? Are there any conferences coming up that people should know about or interesting activities that you're participating in? That you want to get the word out, and I have a call to action?
Gary LaFever 53:16
It's tied to the paper, but it also has to do with with the proposed data framework, right. One of the benefits in my mind to privacy left is you can reconcile different philosophical and societal approaches to privacy. What do I mean by that? Proportionality, right? Even the European Commission was only giving the new framework a 70% to 80% likelihood of withstanding judicial review and challenges. But, I actually think there's a way that the existing proposed framework could satisfy even Mr. Maximilian Schrems himself. That is, if you read the paper on statutory pseudonymization, and that was part of the data framework that basically people who certified to the new data framework, which is they do with the Privacy Shield today would have to certify that they're using a technical safeguard such as statutory pseudonymization, I actually think the approach that the U.S. takes and the approach that the EU takes could be reconciled.
Gary LaFever 54:18
Why? Because if the federal government, the U.S. federal government, requested production of data, what they would get would be pseudonymized data. That pseudonymization data has been de-risked. It can be processed. It can be computed and produce results. But, to interpret those results in a way that reveals identity requires the additional information that would be held separately in the EU. The beauty of this, it's already with the CJEU said. It's already what the EDPB has said. It would just be an additional level of recognition that in order to benefit from the data framework legality, which is just words, you need to back it up with some privacy left engineering tools and safeguards.
Debra Farber 55:00
Wow. Yeah, that's...I mean, that's a great pitch. I just want to help you with the paper on that. Let's just...
Gary LaFever 55:08
Let's write the paper together.
Debra Farber 55:09
...get it out there for people, you know, integrating this tool sets to be able to shift left. I mean, it's definitely core to my mission, and I love having guests on who agree and are doing the hard work of bringing it to market?
Gary LaFever 55:24
Oh, I think you're doing the hard work as well. You're getting the word out. That's what's really important.
Debra Farber 55:28
I feel like that's the fun part.
Gary LaFever 55:30
I want to thank you on behalf of many practitioners for having been where you've been, and have done what you've done and continue to do, because it's, look, it's a complicated space; and, people like yourself who can really cut to the chase, you're a welcome beam of light and illumination. I know that's a little overstated, but thank you for all you do.
Debra Farber 55:50
Oh, I really appreciate that. Thank you.
Debra Farber 55:53
You know, Gary, it's been wonderful talking with you today about how to achieve privacy guarantees without tradeoffs as we shift left, and you know, for joining with The Shifting Privacy Left Podcast.
Gary LaFever 56:06
Debra Farber 56:07
Until next Tuesday, everyone when we'll be back with engaging content and another great guest.
Debra Farber 56:14
Thanks for joining us this week on Shifting Privacy Left. Make sure to visit our website shiftingprivacyleft.com where you can subscribe to updates so you'll never miss a show. While you're at it, if you found this episode valuable, go ahead and share it with a friend. And, if you're an engineer who cares passionately about privacy, check out Privado: the developer-friendly privacy platform and sponsor of the show. To learn more, go to Privado.ai. Be sure to tune in next Tuesday for a new episode. Bye for now.