S2E33: "Using Privacy Code Scans to Shift Left into DevOps" with Vaibhav Antil (Privado) Artwork

The Shifting Privacy Left Podcast

Shifting Privacy Left features lively discussions on the need for organizations to embed privacy by design into the UX/UI, architecture, engineering / DevOps and the overall product development processes BEFORE code or products are ever shipped. Each Tuesday, we publish a new episode that features interviews with privacy engineers, technologists, researchers, ethicists, innovators, market makers, and industry thought leaders. We dive deeply into this subject and unpack the exciting elements of emerging technologies and tech stacks that are driving privacy innovation; strategies and tactics that win trust; privacy pitfalls to avoid; privacy tech issues ripped from the headlines; and other juicy topics of interest.

All Episodes

The Shifting Privacy Left Podcast

S2E33: "Using Privacy Code Scans to Shift Left into DevOps" with Vaibhav Antil (Privado)

October 31, 2023 • Debra J Farber / Vaibhav Antil • Season 2 • Episode 33

This week, I sat down with Vaibhav Antil ('Vee'), Co-founder & CEO at Privado, a privacy tech platform that's leverages privacy code scanning & data mapping to bridge the privacy engineering gap. Vee shares his personal journey into privacy, where he started out in Product Management and saw need for privacy automation in DevOps. We discuss obstacles created by the rapid pace of engineering teams and a lack of a shared vocabulary with Legal / GRC. You'll learn how code scanning enables privacy teams to move swiftly and avoid blocking engineering. We then discuss the future of privacy engineering, its growth trends, and the need for cross-team collaboration. We highlight the importance of making privacy-by-design programmatic and discuss ways to scale up privacy reviews without stifling product innovation.

Topics Covered:

How Vee moved from Product Manager to Co-Founding Privado, and why he focused on bringing Privacy Code Scanning to market.
What it means to "Bridge the Privacy Engineering Gap" and 3 reasons why Vee believes the gap exists.
How engineers can provide visibility into personal data collected and used by applications via Privacy Code Scans.
Why engineering teams should 'shift privacy left' into DevOps.
How a Privacy Code Scanner differs from traditional static code analysis tools in security.
How Privado's Privacy Code Scanning & Data Mapping capabilities (for the SDLC) differ from personal data discovery, correlation, & data mapping tools (for the data lifecycle).
How Privacy Code Scanning helps engineering teams comply with new laws like Washington State's 'My Health My Data Act.'
A breakdown of Privado’s FREE "Technical Privacy Masterclass."
Exciting features on Privado’s roadmap, which support its vision to be the platform for collaboration between privacy operations & engineering teams.
Privacy engineering trends and Vee’s predictions for the next two years.

Privado Resources Mentioned:

Free Course: "Technical Privacy Masterclass" (led by Nishant Bhajaria)
Guide: Introduction to Privacy Code Scanning
Guide: Code Scanning Approach to Data Mapping
Slack: Privado's Privacy Engineering Community
Open Source Tool: Play Store Data Safety Report Builder

Guest Info:

Connect with Vee on LinkedIn
Check out Privado's website

Send us a text

Privado.ai
Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.

Shifting Privacy Left Media
Where privacy engineers gather, share, & learn

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.

Vaibhav Antil: 0:49

Our goal is that if we do all of this work well, you start with the place where you already have the data map. Once you have the data map, you actually can see the risk really quickly and you can then you can do the good work; you can go take that risk to the Engineering Lead or the Developer and say, "Hey, there is a risk and we want to reduce it. There are four ways how we can do it, and let's work on that." So you save all of the work, all of the time you're spending in manually building these data and you get all of that time free. And, you spend all of that time in actually reducing the risk and actually putting a privacy ticket on the Jira board, which gets shipped. So, the risk is less. Your users actually see it.

Debra J Farber: 1:34

Hello, I am Debra J Farber. Welcome to The Shifting Privacy Left Podcast, where we talk about embedding privacy by design and default into the engineering function to prevent privacy harms to humans, and to prevent dystopia. Each week, we'll bring you unique discussions with global privacy technologists and innovators working at the bleeding edge of privacy research and emerging technologies, standards, business models, and ecosystems. Welcome everyone to Shifting Privacy Left. I'm your Host and resident privacy guru, Debra J Farber. Today, I'm delighted to welcome my next guest, Vaibhav Antil (also known as Vee), the CEO and Co-Founder of Privado, the privacy tech platform that's bridging the privacy engineering gap, and sponsor of this podcast. Welcome, Vee!

Vaibhav Antil: 2:25

Hi Debra. Hi everyone.

Debra J Farber: 2:27

Before we start, I want to say a few things to you, Vee. I've had the privilege of serving as a member of Privado's Advisory Board, along with Nishant Bajaria, for the past two and a half years, advising on go-to-market messaging, thought leadership and Shifting Left evangelism; and it's been such a wonderful ride. I even remember when I was on vacation in Belgium a little over a year ago and received an email from you, Vee, asking me whether I'd consider launching a podcast that's focused on privacy engineering because Privado wants to sponsor it. I could not have been more excited. That really kicked off and started my creative journey with the Shifting Privacy Left podcast, and it's enabled me to bring this content for a growing niche audience, and I just want to take this time to publicly thank you for believing in me early on and supporting this really important educational initiative.

Vaibhav Antil: 3:23

I'm glad we could work this out and be sponsoring the podcast, and it's been amazing in creating such great content. I especially love a couple of episodes, and with so much diverse opinions and experiences. So, I learned a lot from this podcast and we've been, you know, super excited sponsoring it and continue to do so as well.

Debra J Farber: 3:46

Thank you so much. It really means the world to me, and I can coo and talk about how much I love Privado and you and the entire team, but I think people probably want to hear about the topic of the day; and that's several things. We're going to be talking about privacy code scanning; what 'shifting privacy left' means, especially into DevP rivO ps; and then, the importance of bridging the privacy engineering gap. So, we'll just kick things off by me asking you a little bit about Privado and how you decided to focus on code scanning for privacy and solving privacy engineering challenges.

Vaibhav Antil: 4:22

For the audience who might not be aware of Privado, Privado is a platform where privacy and engineering teams collaborate. They come together to ensure the products the companies are shipping are built with privacy from the get go. I would imagine the same role in a company is played by a privacy engineer or another privacy professional who's acting like this connective tissue or who's bridging this gap between privacy and engineering teams. I would like to think Privado is the co- pilot for this person, this Privacy Engineer or this privacy professional who's working with engineering teams, whose core job is to work with engineering teams. Essentially, my journey into privacy has been different. I mean, at least this is probably the most interesting question I ask everyone: how did you end up in privacy? And, everyone has different answers and different paths. So, my path came in from a product management role. I was a product manager for an online music streaming company, and I had seen the world where business goals as a product manage are: revenue, active users, engagement were super critical. I had this deep empathy for engineers - developers who are building these products - because they were kind of the centerpiece of shipping these features fast, which could, move the revenue numbers, the engagement numbers. They could literally do magic by building these features. While I had this experience, I ended up getting involved into a privacy project with the legal team there and kind of saw the other world - which was a privacy world. I felt that was pretty exciting, as well, because they were trying to protect the user data, do right with the user, which as a product manager, whatever you're building, you want to do. You want to make sure the promises that you are making kind of consistent and the end users' privacy is protected. That's kind of how I got introduced to privacy.

Vaibhav Antil: 6:15

When I started Privado. . . actually, the first thing I did was I tried to be like a privacy professional. I said "he first thing I'm going to do is do a data mapping project for an e-commerce company and I still remember that was. That was a sizable company, was a large e-commerce company and I probably interviewed about 80 people. That was a mix of Product Managers, Engineers, Engineering Leaders, Data Managers, Data Engineers, and the project was this crazy project to map in an Excel sheet the microservices, the data pipelines, all the products they have, and how data comes into the company and where all it goes. I still remember, I did this huge exercise - probably took like two and a half, three months - and I went to my CTO and said, "Hey, look at this sheet and this is basically around which we have to build the product.

Vaibhav Antil: 7:02

At that point of time, I was still thinking more assessments; that was kind of the way privacy teams used to get this data, which I got via interviews. But, when my CTO looked at it, he kind of had a different idea; and he basically said to me that there were probably like 30%- 40% of this information we can get by scanning the code and the rest - something we might still have to do manually - at least it'll take you that far. That led me to a realization; that was kind of our spark or our Eureka! moment where we're like, "Hey, nobody is kind of doing this at the moment.

Vaibhav Antil: 7:37

If we attack the problem from code scanning, it's a very good experience for the privacy team because they get to this information super fast. For the engineer, it's a really superior experience because when you're talking to them, the biggest problem was not understanding what a lawyer is saying. So, they get to see the same things. They get to see data in a format they understand." And, we were early to market. So, that was exciting to me as well.

Vaibhav Antil: 8:05

So that's how privacy code scanning got started, with this huge Excel RoPA data mapping exercise; and now, of course, the vision has evolved. It started with doing RoPA, then data mapping (very, very small), and as we got some customers and started working with them, now of course, our vision is to actually be the single platform where privacy teams and engineering teams will collaborate and building a platform that engineers love and not hate, which actually gets adopted, and the privacy risk goes down. You could get more privacy issues, reviews; you can get some privacy functionalities as part of a Jira board. So, kind of enabling the privacy function, and at the same time making sure developers actually like the experience and they're engaged, as well. So, that's a little bit about how we started with code scanning and where we are in our journey.

Debra J Farber: 8:59

Thank you for that. That's really helpful. I know that the mantra for Privado is 'bridging the privacy engineering gap.' Can you talk a little bit more about this gap? What do you mean by it?

Vaibhav Antil: 9:12

I think this is something at least as we started, or as the initial concept of doing a code scan happened, we started speaking to privacy professionals in Europe and the U. S. I remember that we used to have slides saying where we were trying to put on some challenges that they face, and one of the things we had was 'Is engineering a black box?' Engineering is not a black box. We still know things, but there is a huge gap in our understanding. What we're trying to ask, the engineers don't understand what they tell. Sometimes we don't understand, and that's how the full concept of this privacy engineering gap came in. I also, at that point of time, read a blog post - I think it was on Adobe, where it was something like 'Privacy Principles Are from Venus; Engineering Requirements are From Mars.' S omething similar. So, a couple of things came together while we were thinking about this. But broadly, if you look at the reason on why there is a gap, I think there are three big reasons. 1) I think the first big reason I've seen is there is this kind of difference in the approach of how these two functions work. So, kind of on the stakeholder level, right at the top, there's a misalignment. If you look at privacy, privacy is a very central function. Right? There's a central team who's trying to make sure use of data is consistent with the privacy promises made; or data actually gets deleted; or you're collecting less data. Right? All the nice, great privacy work, but that's done by a central team and they have the central view of things. So, look at engineering; it basically evolved from the day where it was central to super decentralized, distributed teams. Right? You have a small team of a Product Manager ,and a couple of Developers, and one Designer - sometimes just engineers who are experimenting and pushing features out, which means on one side, you want the central view of data and privacy; and on the other hand, you have this decentralization: where they are making decisions of data, architecture, tooling, decision- making on their own. So, that's the one big gap that we see is like just the way these two functions. Things are different.

Vaibhav Antil: 11:16

2) I think the second related gap here is also the speed. Engineers are kind of shipping things on a weekly, sometimes daily, basis; but, for any privacy teams, to catch up here becomes a big problem, and the tooling hasn't caught up. So, that's another conversation. So, I would say the first big problem I see is, overall, how these functions are structured are slightly different. I would imagine it's the same thing with security as well, but they have better tooling at this point to tackle all of these challenges. The second thing I see is which becomes a problem is lack of shared vocabulary. So, it's kind of like the problem is that none of these laws is descriptive; everything is written in a more principle- based way.

Vaibhav Antil: 11:56

But, when you go to the engineering team and say, "Hey, you have to delete data, it means something. Or actually, let's take a non-privacy example. As a Product Manager, I would go and say, "hey, our website has to be fast. It doesn't mean anything right now, but for an engineer it will mean something when you talk in seconds. "Okay, what does it mean? How far should the page load? What is fast? Is it 10 seconds, one second, micro second, or you can have. You can say the same thing about the specific API that, as a product manager, I own. Like you know, the response time has to be X. It has to be available for Y time. You cannot go and say our API has to be available." there has to be some number which makes sense for the engineer.

Vaibhav Antil: 12:37

So now, once you look at this lens and go back to the privacy laws, they are not prescriptive in that sense; and hence it, becomes very difficult to operationalize privacy.

Vaibhav Antil: 12:49

3) So, you have this gap where they're not speaking the same language. The law itself is not prescriptive; and hence, it leads to the friction between the two teams. And a combination of this, I think, finally leads to a point where you might start with a very engaged privacy and engineering team; but, as engineering teams look at that, they did a bunch of assessments and questionnaire and they did not get any privacy request to implement in the product. It leads to a point where it looks, to Engineering Teams, that privacy has become a blocker. And, engineers are smart, so they try to then find ways around your DPI or PI process, which means you have now even more risk. So, I feel like these are three kind of structural things because of which there is this gap between privacy and engineering teams. 100% solvable; that's what I think any privacy person working closely with engineers, or a typical privacy engineer ,is trying to do as well. That's what we, as Privado, are trying to solve, as well.

Debra J Farber: 13:49

That makes a lot of sense to me. One other area that I think is just very, very different between engineers and the privacy function that comes out of Legal / GRC traditionally is that lawyers don't want risk documented where there are unknowns and you don't know how to solve it. There was a time, for instance, that I was working on a major. . . a major breach had happened at a company, at a retail company; and I was one of the subcontractors working on that, and we needed to document the 'as- is' state for privacy and then come up with the 'to- be' state for privacy. So, that involved lots and lots of interviews with various types of people, including engineers and the attorneys on that case didn't want anything that was put in discovery. . . they didn't want anything that could be like "You did things wrong and now it's at trial and could be part of the discovery process, so they did not want us to document any of the risk. So, as a result, they put all of these attorneys in our meetings to make sure that those meetings were covered by privilege, right?

Debra J Farber: 14:49

I mean, how different is that from engineering, where it's like let's get to the meat of this problem. Let's root cause it. Let's document and make sure that we address it and fix it. Right? Right"And so those two different perspectives have been a real friction point and I think I definitely think something like Privado makes it so that the Legal and the GRC folks can have insights derived from all the great work that code scanning does and the data mapping and stuff that you do; but also give the engineers the granular data points, so that they can take action to address privacy risk. I just wanted to tell that little story because I definitely think that that's a wide gap between Engineering and Legal.

Vaibhav Antil: 15:26

However, what I've been seeing in the industry for the last couple of years is. . . one of the good things that all these regulations have done and also, I think, in general user behavior change and how much they care about privacy - is that companies have realized that this work is not going anywhere. So, while maybe the earlier approach might be to document less, I think companies slowly are realizing that they have to do the work. Once you have to do the work, everyone is, I think, reaching the same conclusion actually, is that we will need to finally take these legal policies and translate something actionable for the engineers where things happen right.

Vaibhav Antil: 16:12

That's the reason why more and more companies have now a Privacy Center and more is being launched. So, they have an open privacy center. A very simple example is to say "Hey, send an email to this person for your DSR to kickstart. So I think, as companies have realized that there's no chance that DSR is going down or there's no way that they can go away with features that have less privacy, they're now making it more formal. A consequence of that is you will see more Privacy Engineers being hired in new companies. Companies with existing Privacy Engineers are starting up more, as well. So, I do think that's a big trend shift which has happened in industries that companies have come to that conclusion, and they are now staffing up, building processes, and buying tooling for the exact same reasons.

Debra J Farber: 16:56

Awesome. So, how can engineers provide visibility into the personal data that applications are collecting and using?

Vaibhav Antil: 17:03

Unfortunately, this is kind of a very tough problem. Again, let me take a step back. If you talk in generalities - nothing related to personal data - I think one thing that people have been trying to do, or were trying to create another kind of documentation, is network diagram, architecture diagrams; which a lot of engineers create as part of their requirement. Maybe it's part of their company's policy or they have to do it because it's a sales facing document. It's very difficult to even get that updated, which has very high motivation for the engineering teams to maintain or accurate version of it, right, they'll see like all those documentation get outdated as well. The same is true for their service names. Right? Nobody has a clear idea of "Okay, I have built this product, which is basically this backend microservice," and then over time, a lot of other components in engineering starts to interact, like your website dashboard and some other backend systems, and then someone makes a change in this specific component and 10 things break on each side. Essentially, I mean, if I take a step back and look at it, documentation of any kind is tough when things are decentralized and are changing fast.

Vaibhav Antil: 18:15

That's where the engineers operate, so they have a product or an application which was built by let's say, if you start to do it today, built by some engineers who are not in the company. It probably has hundreds of thousands to millions of lines of code, which means there's no single person who can actually tell you what is in there. What they can do, is they can at least give you a good starting point. I think engineers can give you a good starting point. The biggest gap I see there is what a privacy person means 'personal data' is; the definition is not the same for engineers. They might be thinking PII, like email address; and as you start to have those conversations, they will give you more information like "oh yeah, we do have clickstream data that we collect. Oh yeah, we do collect audit data. Is that even personal data? So I think it is super challenging. But current engineers can give you at least a starting point on what data are they collecting and using. Directionally, you will get the right answer, but an accurate representation, depending on what stage you are. . . if you are at a decent stage; which means your company has been there for a while; which means there are multiple people who built the product; and which means it'll be super complex.

Vaibhav Antil: 19:25

At this stage, I would recommend to do more automation. One type of automation is doing code scanning. Another automation, or actually, even if you don't want to automate, you can just look at your product, go to the website, try to use it and say, "Oh, I see one data element which is missing, I see another." Do it in a session, and you will actually uncover even more data elements. So, a session where you can go through the product and then ask questions will give you more data elements than just sending an Excel sheet or assessment; and ideally, what you want to do is generate these documentations from automated tools. I believe privacy code scanning - of course I'm biased - is one of the better ways to do that because it's generating the document from the source of truth, which is the code; and it actually enables engineers.

Vaibhav Antil: 20:09

In fact, to that point, the Google Play Store came up with this requirement to generate Play Store Data Safety Report. We actually open sourced, and made available, a free tool - it is still available - called Play Store Data Safety Generator; and what it does. . .(and it got downloaded, I think, thousands of times) where Android engineers came in, downloaded the app, ran a scan on their Android application, and it prefilled nearly 80% of the Play Store Data Safety Report. At the end of the day, they filled out the rest of the stuff, downloaded it, and put it there. So, that's another example of a code scanning approach, where you're using some automation to generate these documents actually help. So, again - it's a long answer, but to summarize, use automated tools. Your engineers will thank you for that. If not, do a session with them. That works as well. I did that when I was trying to do it the first time.

Debra J Farber: 20:59

That's definitely is easier to do when you just have maybe one product or team. But, if you're like a privacy person that comes in as a consultant, that's like "oOkay, we've never done this before. Go and get data maps of all our products. Have all of these interviews"- I mean something I've done before; it is a massive, massive effort. So, I definitely think automation comes in handily that way.

Vaibhav Antil: 21:21

Just to add one thing - one of our core motivations is, at the end of the day, as privacy professionals,

Vaibhav Antil: 21:27

as privacy engineers, you really want to go in and build something in the product or remove certain flows which should not be there in the product.

Vaibhav Antil: 21:37

So, you want to reduce privacy risk, and you want to shift privacy features; but, the majority of the work is spent in collecting data, is asking people questions, is building this data map, which I feel should not be there.

Vaibhav Antil: 21:49

Our goal is that, if we do all of this work well, you start with the place where you already have the data map; and once you have the data map, you actually can see the risk really quickly and you can then do the good work, which is, you can go take that risk to the Engineering Lead or the Developer and say, "hHey, there is a risk and we want to reduce it. There are four things how we can do it and let's work on that; but you save all of the work, all of the time you are spending in manually building these data maps, and you get all of that time free. You spend all of that time in actually reducing the risk and actually putting a privacy ticket on the JIRA board, which gets shipped. So, you know the risk is less. Your users actually see it." I would say that's the strongest argument I have personally seen for automation is you you free up time which can be used to do better things, for things which can impact the user privacy immediately.

Debra J Farber: 22:42

Exactly, I totally agree. In fact, I was going to ask you, why is shifting privacy left into DevOps important to you, but I think you kind of summed up some of the reasons why. I want to give you the opportunity to add anything. . . to answer the question holistically: why is shifting left important? You could shift left into design, shift left into architecture. Privado focuses on DevOps, so I wanted to highlight that and get an understanding from you.

Vaibhav Antil: 23:07

Yeah, sure. I think one of the great things that has happened in engineering is people have moved to this agile development. A part of it is also continuous software delivery, which means you are continuously shipping features out. You're fixing bugs super fast. For the end user, it's amazing because, let's say, our own customer, we find a bug today; we can literally fix the bug and, in the next 10 minutes, it gets deployed. So, companies have built this amazing tooling and infrastructure for engineers where they can build features, release them, and they go live at the customer level super fast. Because of cloud and best DevOps practices, it's possible. So, a lot of investment has gone into this. How can we actually build fast, ship fast so our end users can get to see these features fast? Once you wear this hat and say, "okay, my company is going to change things super fast, right, they're going to change things every week, things are going to get shipped every second." Then, you can't have a privacy mindset, which is thinking weeks or months, which is typically like a privacy impact assessment or a data protection impact assessment At the design stage you do want to spend time because the feature has not started to be developed. So, you can spend that time and do technical privacy reviews, etc. But, once all of that is done, there is huge work which doesn't come into that regular design cycle. It goes through this continuous software delivery model where in engineers are writing code, they're pushing code, things don't remain the same from a design to an implementation phase, and that's basically a missing gap in the current privacy programs. You don't really have an oversight there, and that's what shifting privacy into the DevOps process means. It means, as a privacy professional, you have a privacy check as part of this DevOps cycle, which ensures the new code changes which are happening, the configuration changes which are happening, a new database being created, a new third- party being added, a new pixel being added to your website, is going through this privacy check. And then, this privacy check is looking at it and saying, "okay, this is a high risk privacy issue, we have to block it. It needs to be fixed before it goes live. Or it's a low risk, we can let it go live, but it creates a ticket for the engineer to fix later.

Vaibhav Antil: 25:22

Shifting privacy left, or this full DevPrivOps is kind of a methodology where privacy is going and saying, "hey, we have this, we shifted extremely left to the design phase. We're right at the inception of a new feature. We are doing privacy reviews and thinking about new products from a privacy perspective, both on the legal side on and on the technical privacy side. So they're going from that to saying, okay, but we also need to solve things at scale. We also need to acknowledge the fact that our engineering workflows are at a daily, weekly speed and we need to have something on the privacy side which matches that as well."

Vaibhav Antil: 25:59

Another way that I can articulate it how a Privacy Engineering Leader told me - the way they like to think about this DevPrivOps or shifting left is a lightweight, consistent governance; wherein, instead of looking at 90 things, you look at 6 things and the rest of the 84 things are automatically checked. So, it's lightweight; it's not adding a lot of time or a lot of commitment for the developers. It's consistent. So, it checks all the time and so you are assured of whatever is going out to production, to your end user, as less privacy risk. That's the philosophy of DevPrivOps or DevOps. Yeah, shifting left just means go earlier, find things earlier so there are fewer things that you're finding in your products, which have gone live.

Vaibhav Antil: 26:45

Ideally. . . the most common one I've seen with our customers, which I never thought would be, is they find these cookies on their website. You would imagine that it would be solved, but if they have a large front end team, they'll find cookies on their website that are first- party cookies, and they're not easy to find who actually added that. Right? Because there could be 20- 30 front end engineers working on the same website or same dashboard. So, third- party cookies are easy to track. It's generally the marketing team or analytics team, but first- party cookies could be some part of the engineering team. So, that could mean the moment an engineer decides to add a cookie, it creates an issue and alert for the engineer to say "you have to onboard it to our CMP" - whatever consent management platform you use - and then, it gets shipped. So, you're not chasing the entire 40 member team when you discover it later in your cookie scan, when it hits actually production. That's shifting left in the entire DevOps world.

Debra J Farber: 28:28

Awesome. Before code is ever shipped to production. So, 'fix it before it's in production' is a great mantra, in my opinion. Let's talk a little bit about privacy code scanning, since that's the crux of what Privado provides. Let's start with what is a privacy code scanner and how does it differ from traditional static code analysis tools in security? And maybe, it's also a good time to bring up the difference between static code analysis and dynamic code analysis to differentiate from what you don't do.

Vaibhav Antil: 28:57

Yeah, sure. Think of a privacy code scanning as. . .the technology itself of scanning the code to look at patterns is super old.

Vaibhav Antil: 29:06

I t started with saying, "okay, our developer is writing code in a way which is consistent with our coding practices" to then, you know, looking for license compliance issues in open source packages, to the security scanning tools that you set, which is looking at, "Hey, do we have code written that a hacker can attack? You know, like, can they? Can they use this weakness in our application code to attack and get some data out? And so the concept is the same. Where you are. . .I mean, the technology is looking at the code, scanning it and then giving you out some information. In that way, the approach - that technology - is old; we just applied it for privacy.

Vaibhav Antil: 29:47

What it really means is. . . essentially, a privacy code scanner is going to scan your code of anything your engineer has built and build a data flow graph out of it. It has to basically give you a couple of data points. It has to tell you what data is coming into this application. What's coming in? What happens with that once it comes in? Where it flows? Where is it going out - which means it could go out to a database, it could go out to a cookie, it could go out to a third party, it could go to your own internal infrastructure, your own internal backend service wherever it is going..

Vaibhav Antil: 30:17

The crux of a privacy code scanner is to build a data flow graph, or a data flow diagram, of your specific application that you're scanning and with the focus on all life cycle: collection, storage, sharing, logging, transformation, processing, all of it. Once you have this basic graph done, then on top of it, you can do a couple of interesting things. Once you have that, you can generate a RoPA report. You can build a data flow diagram for your threat modeling. Once you have all these flows, then you can leverage something like a policy or a rule engine to say, "show me all applications where sensitive data is being shared with third- parties," or show me all applications where we have a pixel, or show me all applications where there are cookies Any of these things. You can do so in a simplest way.

Vaibhav Antil: 31:08

Privacy code scanning - think of it as something which will scan the code of the application and build a data flow graph or a diagram out of it, focusing on specific lifec ycle cycle of data coming in, which is collection, and data going out, which could be sharing, storage, logging, processing in general. Once you have that, it should enable you to do all the core privacy use cases of RoPA documentation, privacy by design, finding privacy issues, and so forth. The main difference is that a security code scanner will not care about all of this. They don't care if the application is going to process personal data. If it has a security vulnerability, it has a security vulnerability. On the other hand, we do care about that; we only care about that.

Vaibhav Antil: 31:53

We don't care about if it has a security vulnerability. We only care about whether we accurately build this data flow graph.

Debra J Farber: 32:00

That is really helpful. I think it would be even more helpful to the audience if you could differentiate how does privacy code scanning and the data mapping capabilities of Privado. . . how is that completely different (because, I know that it is) from the traditional tools out there that do personal data discovery, correlation, data mapping - companies like BigID, Secuvy. ai, Securiti. ai, companies like that.

Vaibhav Antil: 32:27

Yeah. I would say, if you look at data discovery in general, it's basically discovering data at rest. What you're trying to say is, "hey, as a company, we've grown; and we continuously collected data, and then we kept on storing it, and now we don't have a good handle of where all of that is stored; which, from a security perspective, is important to secure the data. From a privacy perspective, it's important because you need to delete the data pieces, do retention, et cetera. So, a data discovery tool focuses on how quickly, at scale, with a good performance, I can go in and tell you where personal data is, in which data store, in which file, in which SaaS app. So, that's looking at data at rest. Now, the reason you need code scanning, even if you have a data discovery tool ( especially if you have a data discovery tool) is that a data discovery tool only gives you the picture of what data you have at rest. But, privacy is all about data use. It's about did you collect more data than you were supposed to? How did this data get used? A famous example was, a company was not fined because their phone number were breached or leaked. The company was fined because they collected phone numbers for authentication purposes, but were using them for advertising. That picture of how data gets used is actually in the code. It's actually in how you build a product, how you build an application. It's as simple as that.

Vaibhav Antil: 33:52

When you're doing a PIA, you're doing a PIA, not on the database; you're doing a PIA on a processing activity or a product function, or a product, or an application. We are doing it because you know privacy is about the full use of data - the collection point, the sharing point - and storage is one part of it. So, that's why you need to scan the code to get an automated understanding of how my data gets used. "Okay, I know it's in this database, but do you know which two products are connected to this one database? Do you know how that gets shared to a third party eventually? All of that information is written in code by engineers, version controlled and maintained, and that's why you need a code scanning solution. That's one answer, which is more around looking at the lifecycle of data, and their discovery solution only captures storage, whereas code scanning solution captures the complete lifecycle, especially if you're looking at engineering and products.

Vaibhav Antil: 34:46

The second argument on where these tools differ is the data discovery tools are looking at things when they've already entered your system. So, they've already collected the data. It's already in your system. That's when you're discovering "okay, yeah, now we have a new database which has these many data elements, right? Or we have this new file." the code scanning solution - because they're super early in the lifecycle; we must shift left - this is when engineers just write in code to collect phone numbers for the first time, or precise location for the first time. You have not really pushed this to your production, which means it's not yet live. No customer data has ever flowed through that code.

Vaibhav Antil: 35:24

You can actually scan and discover all of these things at that time, which means if you want to get out of the cycle of a technical privacy debt, as you move left, through code scanning you can prevent some of these things.

Vaibhav Antil: 35:37

You can basically ensure that if you were not supposed to collect precise location, you're not collecting it, which means you have to do less scanning on the right side, where data discovery tools sit. You can also enforce better privacy standards, privacy engineering standards. When someone is creating a database, attaching it to your application code if that database table has sensitive data along with general identifiers, you can tell them not to do that. If they are going to share data with third parties, that's something you can catch early and fix as well. I would say these are the two main differences. One is from a coverage perspective - you do want to get clear line of sight into data sharing and data use, which is only available in the code. It's not available at the data storage layer. And I would say, second is moving left to fix and prevent things - find, fix, and prevent things rather than fire-fighting on the data side only. That's how privacy code scanning approach differs from a classic data discovery approach.

Debra J Farber: 36:39

I also want to highlight something. I am both an Advisor to Privado as well as an advisor to Secuvy because I definitely think that they are separate products that address separate issues. For companies today that are like w"We need to know where our personal data is. This wasn't a thing we needed to do and, in the big data era of collect everything, figure out a use for it later. It's sprawled across our environment. It's structured, unstructured, semi-structured data stores. Where is it? Because you can't protect any personal data if you don't know where it is. You can't delete it if you don't know where it is. You can't provide it to somebody and show them what you've collected about them if you don't know where it is. And so, I think it is absolutely important for companies in that space to use a data discovery, correlation, classification mapping tool.

Debra J Farber: 37:29

But, as you said, I want to see a world where everyone's using a privacy code scanner like Privado's, where you get your engineering best practices put in place. You embed privacy into the software development lifecycle, which is not the BigID, Secuvy, Securiti. ai tools are doing. They're looking more at the personal data lifecycle rather than embedding into the software development lifecycle. They are very different tools and I definitely think companies can use both, obviously depending on the maturity and whether or not you have the right resources to manage these tools. How does privacy code scanning enable engineering teams to stay compliant with new laws? I know you mentioned that before, but let's take, for instance, my state - Washington State's My Health My Data Act, which grants consumers the right to access, delete, and withdraw consent from the collection, sharing and processing of their health data.

Vaibhav Antil: 38:21

Yeah, absolutely. In fact, the My Health My Data Act is a classic example of where you do need to move to code to actually comply with the law, because it focuses on a couple of things. One of the main things it does focus on is use of data, like purpose of why you've collected it and how you're using it. So that is one. Sharing of health data is the most critical part there, and both of these things are in the code. So, take a health app. The first thing they would want for them to comply as a privacy or an engineering team. . . they will be asked "what health data you collect and who you share it with and how you use it. With Privado, you can basically connect in five minutes. We'll run a scan on all your code repositories and give you one single view of all the health data you're collecting; where that health data is flowing; and which products are using this health data, which means it solves all the three use cases of My health My - you you can, to do your consumers, consumers confidently say"This this is the health data that we collect, this is who we share it with. These are the service providers. If you are sharing it for any advertising reason, then you have to take authorization; and this is how all use cases we are using the health data for. So that's number one. So, you get to compliance faster. As I said, you don't have to spend months in figuring out this information. You can fix things rather remove certain health data flows.

Vaibhav Antil: 39:46

I think the second most important thing with the My Health My Data Washington Act is, if I remember correctly, there's a class action lawsuit, which means there is a very high onus on companies to prevent privacy issues from going to production. This means you do need a policy in your DevOps system, in your SDLC, looking for privacy issues.

Vaibhav Antil: 40:07

So, something needs to check for saying, "Hey, I pushed this code, I was using Google Analytics, but then I, incidentally or accidentally, actually started sending health data to Google Analytics because the page name has it or the search query had it, or I was intending to send only the user ID, but I sent this full object which has the health details as well. That's the second part. So one is like the initial compliance, which is your data maps, faster. You get the entire picture of health data collection, sharing ,and use - faster. And, second is guardrails - having these automated privacy checks for My Health My Data on behalf of privacy teams, which are not only looking for issues, but also providing a guidance to engineers on what to do right when they make a change. This will help you scale your program and actually, not only comply, but also remain in compliance with these laws.

Debra J Farber: 41:04

Awesome. Thank you for that. That's illustrative. All right, I want to talk a little bit about the future of Privado and just also privacy engineering education for a little bit. One of the things that really impresses me about Privado is that the founders focus on educating the next wave of privacy engineers is so important. Besides sponsoring this podcast, Robert Bateman's 'Privacy Corner', and, Vee, your own webinar series on privacy engineering, Privado recently made available a free Technical Privacy Masterclass hosted by Nishant Bhajaria, Privado's Advisor and Author of the seminal work 'Data Privacy: A Run Book for Engineers.' Can you tell us a little bit about this masterclass? Who was it designed for? What topics does it cover? How long does it take to complete? that kind of thing?

Vaibhav Antil: 41:56

That was such a fun project to build this course with Nishant on technical privacy. Again, the motivation for us was. . .it was also like an inward- facing motivation for us. It was like, "Hey, as we've been speaking and working with our customers, we end up working with a lot of technical privacy leaders and privacy engineers;" and some people have been in industry for a while, some are entering and I was like, is there some space where technical privacy leaders or privacy engineers can come together, Learn, talk, care, and that led to this one- by- one or multiple kind of initiatives, like starting with sponsoring a podcast. Then, as we went through that feedback cycle, working on this course, launching a community, doing our own webinars around this as well. It's been a great learning experience because I personally get to learn a lot doing. . .not just the masterclass but looking at questions in each webinar, listening to your podcast as well. Right? Specifically, on the masterclass, I mean it's for anyone who's working in a technology company. So, basically, we built it in such a way that it talks about why engineering is different. If you are in a tech company that has a lot of engineers, it starts with explaining how engineering has evolved and why it is different - kind of the same thing we were talking about initially.

Vaibhav Antil: 43:17

Then, it goes into how you can build a privacy program, which is kind of more aligned to how your engineering practices are.

Vaibhav Antil: 43:23

So, it's more proactive it actually kind of works. Then, we also went a little deep into how can you build a privacy tooling; how can you build your own DSAR; how can you buy or build a data discovery solution; and then, finally, goes into KPIs. It also has a nice bonus episode with Nishant's journey into privacy engineering. So, our core motivation was to help people who are entering or who are already, I would say are, 'Unofficial Privacy Engineers,' who are still doing the work. So, they can get a resource to begin with, and they can learn and enter the field. That was our motivation to do it. It's been a great experience launching this because we've got such an amazing feedback, both positive and also critical, on where we can improve. We have a lot of data points on the next modules we want to add and the version 2, how it could change, as well. But yeah, it was such a fun project to work on as well.

Debra J Farber: 44:20

I've been in privacy for over 18 years and even I learned new things from Nishant's perspective on how to approach implementing some of these solutions. I absolutely think - this is the best technical privacy class I've ever come across. I think it's excellent. I'm not raving about it simply because I'm a Privado Advisor. I think it's something that can be used by, well. . .anyone in privacy can take it; it's certainly aimed at technical folks. It would be great for engineering teams that are upskilling on privacy to take this. It would be great for current Privacy Engineering Managers to take this and take it with their teams, and maybe something they do as they talk about increasing their skills over the year, and being able to show the certificate that they've completed it. How long is it? I know I took it, but I wasn't sure. Is it like around 3 hours long?

Vaibhav Antil: 45:11

Yeah, roughly about anywhere from 2 - 3 hours is probably the content.

Debra J Farber: 45:16

It's somewhere between 2 - 3 hours.

Vaibhav Antil: 45:18

It's not a lot. This is kind of, as you said. . .probably like one third of people who attend our privacy engineering webinars are people who are trying to enter the field. For them, this is like a good primer. Some of them are actually software engineers who built. . .they literally have built a DSAR system in the company from ground up, like the full infrastructure; and now, they can actually learn the other parts of privacy, which is not just the tooling, but the wires in the house and you know how to think about it - a privacy program, KPIs, and stuff like that. So, yeah, that is the kind of audience that we have and, two to three hours - not a lot of effort, but a lot of learnings for sure.

Debra J Farber: 45:58

Oh, so much impact! So much wisdom dispensed by Nishant in that amount of time. So, it really does feel like a masterclass. You could be talking to experts, as well, in it. He's clearly talking in a way where both experts can step back and get more of a holistic perspective about what their approach is; but newbies to the field could also pick up those best practices immediately. So, it really is good. I'm going to put a link to it in the show notes so that anyone can check it out; and, remember it's free, so there's really no negative here. If you have the will, then Privado's provided the resource. I do want to know if you have numbers? How long has it been out now? Like a month or two? How many people have completed the course so far?

Vaibhav Antil: 46:39

About that - we are inching towards getting 1000 enrollments to the course. So, yeah, it's organically grown a lot. I mean, again, I'm super thankful to people who took the course, posted on LinkedIn, and gave good feedback because that encourages more people to take the course and then, you know, more people can learn about it. Because it's successful. . .again, as I said, the next version of it - at least for the people have taken it - we are going to go deeper. Like one is Technical Privacy Review - something that I'm really looking forward to create with different people, along with Nishant, something very specific which goes into, OK, how do you do a review? How should you design it? What should the KPIs be?" So, once we have this base where people have taken it, I think the next ones are going to be more in there. Ai governance is another one we are thinking about as well.

Debra J Farber: 47:34

Oh yeah, that makes sense. There's a lot of demand for knowledge there. So, I know that you've got some really exciting features on Privado's roadmap. Would you mind sharing a few of those with us?

Vaibhav Antil: 47:45

For sure. I think broadly, again, as I kind of laid out, our vision is to be the platform where privacy and engineering teams can come together, collaborate, and they can confidently ship features fast with privacy built- in from the get go. And once you have this vision, saying, "Ok, this is what we want to do". We want to enable speed - for the developer, it's a good experience with less privacy risk, better privacy experience for the end user. It really forces us to solve for some things. The first problem is this visibility problem . as As a privacy professional, if you're working in a technology company, one question if you want to answer is "I have this product, product A. Show me the data flow diagram of that. I mean, it's so hard to answer that question, even with all the work that has been done; even if you bought a privacy program management tool, did the data mapping, bought a data discovery tool. Still, you're not able to answer that because the tooling was not built to answer this question. So, our motivation, our vision, all the features that are coming up is "hey, we already probably solved the problem to a good extent, but it has to be super simple. Someone can answer this question really, really fast. And then, the next level is, for example, we can automate documentation like RoPA for 40% - 50%, and then now, we've just taken it up to even 70+% thanks to our language model. So, how can you use code - the source of truth - to build this documentation, which are evergreen, which are coming from the code, which have a lot of context as well? So, that's kind of the first big puzzle that we are solving.

Vaibhav Antil: 49:22

The second big puzzle we are trying to solve for is making privacy- by- design programmatic. A lot of efforts are done in. . .the entire privacy- by- design process is to do privacy reviews, which are super important to do at the design stage for large features. But, you can also go into a rabbit hole where you can continue to do them. You can really scale them up, which is where you do so many privacy reviews that you have zero rest. And then what? That reduces the development speed. That literally halts product innovation; and, in today's environment, if you are not innovating, your company is kind of suffering.

Vaibhav Antil: 49:59

So, how can you make it to a point where you're reviewing the right stuff and then a code scanning solution is taking care of everything else, and a part of it is the product? A larger part of it is also open sourcing these privacy rules that we have. So, a large part of work is happening there as well. I'm super excited about that. Then, I would say the third thing is going deeper into this full collaboration space of privacy and engineers and these technical reviews; and again, helping them scale up. So, we have an interesting product there as well, wherein we are kind of looking at stuff like PRDs, ERDs and trying to help people streamline or triage and say, "Hey, these are the five requests that you should definitely review and these are the ones that you can kind of safely ignore and let the privacy code scanner take care of it." But again, overall vision is to have this one platform where these two personas can work together; and, for that, we have to solve for visibility, governance, and collaboration.

Debra J Farber: 51:02

Awesome. Thank you so much for that. Privacy engineering is a small but growing field, for sure; and you definitely have a front row seat to its development. You're working with customers from a variety of industries and of different sizes. Can you tell us what trends you're seeing in this space and how do you see the field shaping up over the next two years or so?

Vaibhav Antil: 51:22

Yeah, sure, I think the basic trend I'm seeing is it's becoming very popular. Companies have decided, and they're hiring privacy engineers. We run a Privacy Community where we post on our Job Board new jobs, and there are always new jobs almost every week. It is getting popular. So, that's a net positive trend, I would say.

Vaibhav Antil: 51:43

The second thing is, at least what I've seen, is Privacy Engineers are getting successful at their jobs, which means they are becoming successful in a company to actually take the amazing work of the privacy professionals - the Chief Privacy Officer, the Chief Legal Officer - have done on the policy side (on creating internal policies, best practices, record keeping) and translate them into engineering requirements and tools that they can use to scan, build, buy - all of that work. One of the simplest trends I'm seeing is existing Privacy Engineers are getting successful in reducing privacy risk; successful in building privacy features and tooling; and, hence, it's a nice cycle wherein because they're getting successful, they're going to newer companies and newer companies are hiring more Privacy Engineers and that's an amazing cycle. This one thing I'm seeing is there are more Privacy Engineers that are coming in. It's still a small community of people where everyone is super helpful, sharing information, and trying to network with each other to share best practices. It is growing. I think that's one trend that I'm seeing in the space.

Debra J Farber: 52:50

Awesome. How do you think it's going to shape up over the next two years or so?

Vaibhav Antil: 52:53

Again, I think it's just kind of mainstreaming. Mainstreaming would mean that we would have better KPIs for different things. I do think, for example, Nishant's book talks about KPIs for data discovery. I do think there will be KPIs for Technical Privacy Reviews or Privacy Engineering Assessments, which will become popular. I do think privacy code scanning will become mainstream, and that's what we've seen with our customers - the Privacy Engineers taking the lead because they would be the ones who will be like "Hey, we have this nice automated check as part of SDLC, which is looking for things, so we are doing high- level work, more important work in the company as well.

Vaibhav Antil: 53:31

I'm not sure about this, but I think the impact of AI governance, Privacy Engineers might also play a big role there, but that's so early that I don't really know for sure. Definitely, they'll have a big role to play, but I don't know how much they will end up owning that piece. . . I mean, it'll be a shared responsibility, but I don't know the exact answer to that. But at least, these are a couple of trends, or couple of ways, I can see the field shaping up.

Debra J Farber: 54:01

Thank you so much. Do you have any words of wisdom to leave the audience with before we close?

Vaibhav Antil: 54:07

No, I mean, I just wanted to come here on the podcast and share our journey and what we've been doing with Privado. Again, personally, we as a company are super grateful to the privacy community. I think privacy in general, I think are the most open people, most accepting people, most diverse people. Everyone is trying to solve the user's problem, which sounds super exciting because you're looking things from the user's angle. Obviously, you have the business metrics, but you're looking at things from the business angle. So, yeah, we're super appreciative and thankful to the entire privacy community for accepting us, and working with us, and giving us good and nice feedback and critical feedback as well. And, we really look forward to working with everyone who's listening on the podcast in the future as well.

Debra J Farber: 54:53

Yes, everyone, go check out Privado. I'll put the link in the show notes. Well, Vee, thank you so much for joining us today on The Shifting Privacy Left Podcast to talk about privacy code scanning, Privado, and the rise of privacy engineering. [Vaibhav: Thanks, Debra, for having me here.] Until next Tuesday, everyone, when we'll be back with engaging content and another great guest, or guests. Thanks for joining us this week on Shifting Privacy Left. Make sure to visit our website, shiftingprivacyleft. com, where you can subscribe to updates so you'll never miss a show. While you're at it, if you found this episode valuable, go ahead and share it with a friend. And, if you're an engineer who cares passionately about privacy, check out Privado: the developer-friendly privacy platform and sponsor of the show. To learn more, go to Privado. ai. Be sure to tune in next Tuesday for a new episode. Bye for now.