The Shifting Privacy Left Podcast

S3E1: "Privacy-preserving Machine Learning and NLP" with Patricia Thaine (Private AI)

January 02, 2024 Debra J Farber / Patricia Thaine Season 3 Episode 1
The Shifting Privacy Left Podcast
S3E1: "Privacy-preserving Machine Learning and NLP" with Patricia Thaine (Private AI)
Show Notes Transcript Chapter Markers

My guest this week is Patricia Thaine, Co-founder and CEO of Private AI, where she leads a team of experts in developing cutting-edge solutions using AI to identify, reduce, and remove Personally Identifiable Information (PII) in 52 languages across text, audio, images, and documents.

In this episode, we hear from Patricia about: her transition from starting a Ph.D. to co-founding an AI company; how Private AI set out to solve fundamental privacy problems to provide control and understanding of data collection; misunderstandings about how best to leverage AI regarding privacy-preserving machine learning; Private AI’s intention when designing their software, plus newly deployed features; and whether global AI regulations can help with current risks around privacy, rogue AI and copyright.

Topics Covered:

  • Patricia’s professional journey from starting a Ph.D. in Acoustic Forensics to co-founding an AI company
  • Why Private AI’s mission is to solve privacy problems and create a platform for developers to modularly and flexibly integrate it anywhere you want in your software pipeline, including  model ingress & egress
  • How companies can avoid mishandling personal information when leveraging AI / machine learning; and Patricia’s advice to companies to avoid mishandling personal information 
  • Why keeping track of ever-changing data collection and regulations make it hard to find personal information
  • Private AI's privacy-enabling architectural approach to finding personal data to prevent it from being used by or stored in an AI model
  • The approach that Privacy AI took to design their software
  • Private AI's extremely high matching rate, and how they aim for 99%+ accuracy
  • Private AI's roadmap & R&D efforts
  • Debra & Patricia discuss AI Regulation and Patricia's insights from her article 'Thoughts on AI Regulation'
  • A foreshadowing of AI’s copyright risk problem and whether regulations or licenses can help
  • ChatGPT’s popularity, copyright, and the need for embedding privacy, security, and safety by design from the beginning (in the MVP)
  • How to reach out to Patricia to connect, collaborate, or access a demo
  • How thinking about the fundamentals gets you a good way on your way to ensuring privacy & security


Resources Mentioned:


Guest Info:



Privado.ai
Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.

Shifting Privacy Left Media
Where privacy engineers gather, share, & learn

TRU Staffing Partners
Top privacy talent - when you need it, where you need it.

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.

Copyright © 2022 - 2024 Principled LLC. All rights reserved.

Patricia Thaine:

Here's, I think, where regulation is going to make a huge difference, because what we saw with regards to the GDPR is that it is a forward-thinking, wishful regulation where a lot of the technology required to comply with the GDPR doesn't even exist yet. We're working on it at Private AI, but it still does not exist yet.

Debra J Farber:

Hello, I am Debra J Farber. Welcome to The Shifting Privacy Left Podcast, where we talk about embedding privacy by design and default into the engineering function to prevent privacy harms to humans and to prevent dystopia. Each week, we'll bring you unique discussions with global privacy technologists and innovators working at the bleeding edge of privacy research and emerging technologies, standards, business models and ecosystems. Welcome everyone to Shifting Privacy Left. I'm your host and resident privacy guru, Debra J Farber.

Debra J Farber:

Today, I'm delighted to welcome my next guest, Patricia Thaine, Co-founder and CEO of Private AI, where she leads a team of experts in developing cutting-edge solutions for identifying, reducing, and removing personally identifiable information (or PII) across 52 languages in text, audio, images, and documents using AI. Welcome, Patricia. [Patricia: Thank you so much, Debra. It's such a pleasure to be here with you.] Likewise. AI is definitely the topic of the year, but I'm really excited to find out a little bit about your journey bringing Private AI to life, your career journey; and, if you don't mind, starting by telling us a little bit of your backstory and how you then ultimately decided to co-found an AI company.

Patricia Thaine:

Yeah, so I started a PhD in order to start a company, actually. I was looking at acoustic forensics. So, who's speaking or recording? What kind of educational background they have? If you include them in automatic speech recognition systems, it can improve the systems quite a bit. While doing this work, it became really obvious: 1) if you get the data, you have massive amounts of privacy problems; and 2) you often can't get the data anyway because of privacy problems. So, two sides to a coin that really encouraged me to go and try to find solutions to privacy problems. Then, I started looking at homomorphic encryption, did a few publications on that, tried to spin up a company in 2017 that was combining homomorphic encryption with natural language processing, scrapped it because it wasn't going to scale, and then co-founded this one in 2019 with my co-founder and CTO, Peter Luitjens.

Debra J Farber:

That's been a while now that you've been working on that; and so I've been watching that you've gotten a lot of investment, actually, from several VCs and corporate VCs. What problems did you set out to solve with Private AI that the VCs looked at and go, "Oh my God, yeah, this is what we need to invest in, this is what we need to bring to market.

Patricia Thaine:

Yeah, so in 2019, it was still quite early, I guess, definitely compared to now, with regards to the fundamental problems of privacy that needed to be solved for unstructured data.

Patricia Thaine:

What we set out to solve is really. . . if you're a Natural Language Processing Engineer or if you're an engineer dealing with unstructured data - which ends up being 80% to 90% of the data that is collected by organizations - you're going to need tools in your arsenal that allow you to handle the data in a very flexible way.

Patricia Thaine:

Oftentimes you have to be able to see the data; it has to be really highly accurate; and, it has to run your environment a lot of the time as well, or in your customer's environment.

Patricia Thaine:

What we set out to do is really take privacy problems that had not been solved before, which are very fundamental, and make them really modular, really easy- to- use, and make it so that you can integrate it anywhere you want in your software pipeline (including at data ingress; and ideally, at data egress). So, you have absolute control and understanding of what kind of data you're collecting. One of those fundamental problems is what is the personal information in the first place in this really messy data. For the most part, folks have been using regular expressions or open source software that is not well- adapted for this particular task and it turns out it's quite a gargantuan task if you're going to do this right. So, that is what we are focusing on - bits and pieces of privacy problems to create a platform for developers to be able to pull what they need and integrate it into their software pipelines.

Debra J Farber:

Okay, and is this mostly around leveraging AI models? Or, are you also applying this for data discovery for GRC purposes, you know, for finding all the data across an organization and then being able to action upon that data or deliver data-subject access requests (DSAR)? Help me put the scope around the problem that you are solving for.

Patricia Thaine:

Yeah, all of the above. There are so many use cases. It really comes down to those first principles of: 1) you: 1) you the personal information information, then then 2) can only keep what you need. ? you solve those two, those fundamental components are applicable to a number of different tasks and they should be applicable anywhere in your environment. Whether you're running on Azure or AWS or GCP or on-premise - and really one of my dreams is to run directly on the edge. We created edge deployments of our products, but it was really too, too early for that. But one day that will be the case.

Debra J Farber:

That's pretty cool.

Patricia Thaine:

Yeah, essentially you want to limit the amount of personal information you're gathering as soon as possible so it doesn't get crazy and messy.

Debra J Farber:

Right, right. I do want to ask, because you do refer on your website a lot to PII, which is Personally Identifiable Information. Is it more encompassing, does it include Personal Data more broadly, or is it specifically identifiable data that you can discover?

Patricia Thaine:

It's identifiable or quasi-identifiable. We identify over 50 different entity types across what counts as PII, PCI, or PHI; and, we're also working on confidential company information as well. So, it's anything that you really need to capture in order to make sure that you're complying with a number of regulations or that you are safely using products within your organization.

Debra J Farber:

Got it. Okay, that makes sense. One of the things that I've noticed in my career is that companies very often are mishandling data, or employees within companies are mishandling data because they're just maybe ignorant about the appropriate ways to handle it, or whether or not something is 'confidential,' or 'personal information' or whatnot. So, how can companies avoid mishandling personal information when leveraging AI / machine learning?

Patricia Thaine:

There are a few misunderstandings about how to best leverage AI in a privacy-preserving way. One of those fundamental misunderstandings is actually that if you're deploying a model directly in your environment, you're okay, you're good, that's all you need for privacy. That is very, very untrue, because normally what you're gonna be doing is fine-tuning that model in your data; and then, what you have to be careful of is if that data that you're fine-tuning that model on or training that model on contains 'personal information' or 'confidential information.' That means that that model is gonna have to have the same levels of access controls as that original data. The best thing that they could do is actually remove the information that you don't need to fine-tune those models on. So, you can limit those needs for appropriate levels of access control, which will end up becoming a complete nightmare with organizations if you have to deal with access control for different models.

Debra J Farber:

I could see that being a headache, yeah. It seems like detecting personal information is pretty difficult to do and there's different purposes for which a company might do this. Maybe data loss prevention purposes and they wanna identify data that might be exfiltrating from the company, but we prevent personal data from leaving. Or, it could be data discovery for purposes of finding personal data across your organization so you can action upon it. Or, it could be for other reasons, right? So, why is it that it's so hard to find personal information?

Patricia Thaine:

Just think of the disfluency of language. Think about optical character recognition errors when you're trying to analyze PDFs or images. Think about the errors that transcription systems make. Think about all of the different languages in the world and how they're used in combination with one another. Think about all the data formats, all the data structures. Think about even how dates are represented differently across English alone and how spelling mistakes might affect that. So, AI is super necessary in order to do this properly.

Patricia Thaine:

What was being done prior to AI for this problem was: here's a regular expression which basically says "find 16 digits that are one after another and call that a credit card number, for example, but then think of all the different exceptions.

Patricia Thaine:

It becomes impossible to account for all of the different exceptions that happen fairly often.

Patricia Thaine:

AI allows you to look at context and, based on the context, determine what is or is not personal information.

Patricia Thaine:

However, there is also a misunderstanding of how to use AI models appropriately for different tasks. Oftentimes with people thinking that you can just throw a machine learning model, like an LLM, at a task that it was purpose- built for and call it a day. For some very basic examples, that works fine; but, in the real world, you get to corner cases so quickly and to be able to cover all those corner cases takes years and years of corner case data collection.

Patricia Thaine:

That keeps changing because language keeps changing, because data formats keep changing, because the way that transcription systems work: their output keeps changing. So, it's constantly keeping track of that and then keeping track of all of the different regulations where the definition of PII, or what counts as PII, is changing. For example, if you think of the data protection regulation that came up out in India, 'cast' is considered PII under that data protection regulation. Right? The sensitive piece of information. That is not something that is in the GDPR, and that's something that you have to then go collect data for, make sure you're capturing the corner cases, make sure you're capturing the corner cases in all of the different languages supported. So, you can imagine it's a 50 x 52 or more scope of different types of entities that you're capturing because of all of the combinations of languages.

Debra J Farber:

You definitely, I think, made the case for why it's so hard. So, given that, tell us about Private AI's architectural approach. How did you design it, and with a special emphasis on the privacy pieces, of course. It's a technical audience, so feel free to get technical.

Patricia Thaine:

Yeah, so we have mainly focused on how can we make sure that the data does not leave the environment in which it is being processed. So we deploy generally on-prem or in private cloud. We do have a cloud API as well, often being used for testing, but also being used by smaller companies or companies who don't have the ability to do the setup, and what we do is make sure that we delete the data right after it's processed, so we don't store or see any of the information. But when we deploy on-prem and private cloud, the only information that gets transferred to us is usage statistics, that's it. We don't see any of the data.

Patricia Thaine:

What that means is that. . .well, we often get asked "wWhat does that mean for your models? How are you improving your models? How do you know if they're working? And the way that we know they're working is our customers, and sometimes our customers' customers, which is when our customers are the most stringent), end up seeing the results of our system. We get feedback by saying "This de-identified piece of information" or "this piece of information was not properly de-identified. Here's a sample that is de-identified for you to create copies of and then fine-tune your model with." So, over time, we end up creating more and more robust models for everybody, with everybody's feedback, but only when they physically give us the data that they want us to know about.

Debra J Farber:

That's fascinating. Thank you. How does the accuracy of the data matching of personal data stack up? Tell us a little bit about the accuracy of your matching.

Patricia Thaine:

Yeah, so it depends on the type of data, and we do focus more on the highly- sensitive piece of information, like names or credit card numbers or phone numbers, things like that.

Patricia Thaine:

What we do is we aim to get to 99% plus accuracy with our customers, and it's a process.

Patricia Thaine:

Right out of the box, it does work very well; and if it's data that's similar to what we've seen in the past, it will work that well, right out of the box. However, with anything unstructured data related, it is constantly something that's being improved upon. What I can say is "we legitimately have the best system in the world for this task, and I do not say that lightly. Every time we do a POC, which we do many of per year, we come out on top. The continuous feedback loop from our customers means that we're constantly improving faster and faster. In addition to that, we have noticed that we do have to, of course, put in proper resources to do what it takes to get a research paper out of this; but, we have over and over observed that our system is more accurate than human level accuracy for identifying personal information. So, you get the higher accuracy than a human and you get the scale that a human can provide.

Debra J Farber:

Wow, that's definitely quite the go-to-market message. What's on Private AI's roadmap, or what are you currently researching?

Patricia Thaine:

This is quite an endless problem.

Patricia Thaine:

What we are hoping to provide the community is a platform to go to when they're dealing with anything PII, and that's regardless of language, regardless of format, and also the ability to better understand that data.

Patricia Thaine:

The research that we're doing is very much around "hHow can we allow our users to better understand the data that they're processing? Also, we've recently deployed a dashboard and a control panel that allows managers to be able to see what kind of information different teams are processing and be able to control directly from the edge. The way you can look at it is, say, a CISO or a product manager is seeing, not as an afterthought, not after the data is collected, but while the data is being collected. "Hey, there's a credit card number that just flowed through this chat and it's about to be stored in your central database. This is alerting you about that. Do you want it to continue? So that's something that we've now enabled and essentially, what we're aiming to do is allow these managers to have a full view of what's going on within their organization from a product- by- product and software pipeline perspective, rather than as an afterthought.

Debra J Farber:

That's pretty cool. Now I want to ask you a little bit about AI regulation. So, I came across your article, literally titled 'Thoughts on AI Regulation' from about a month ago on LinkedIn, and what struck me is not so much your opinion, because I agree with it and I've been thinking the same, but I just love that, like me, you didn't pack any punches. You write. "I think the call for regulation to prevent rogue AI is so utterly absurd and, like in bold, I really can't emphasize enough how absurd either pausing the development of foundation models or regulating them would be for the purpose of preventing existential risk, totally agree. Would you mind unpacking that a little bit and explaining why you feel so strongly about the existential risk?

Patricia Thaine:

Partially because I do strongly believe that regulations can help with the current risks, and that's definitely a focus that should. . .one of the main focuses or the main focus that regulation should have, which I think that is where it's leaning. Then, for the existential risk, if you read document talking about his concerns about existential risk and having rogue players create a rogue AI, for example - if you think about what that means, it means you could have, for example, an AI that is generating code, that is doing so in a way that humans cannot keep up, and then being able to hack devices and cause all sorts of raucous. But the key thing is humans can't keep up. And then, if you think about what regulations can do, regulations can make sure that the people who care about regulations are following the regulations.

Patricia Thaine:

When it comes to these models, if they ever do get to this point - I'm not commenting on whether or not it's a possibility - but, hypothetically speaking, if they get to this point, any nation state is going to be able to train a model like this. It's the amount of compute that really matters. How to do it also matters, but there's theft, there's research, ideas get propagated. I highly doubt that if it's a nation state or a large enough criminal organization that this would not be possible to do if we get to this stage in AI training. So, if you think about this, the only real possible way to counter this is in investing in AI that can counter that potential rogue AI. That's the only thing that can surpass human capabilities in this scenario and actually protect the systems in an active way.

Debra J Farber:

So, then let's call that anti-rogue AI. So then you have anti-anti-rogue AI and it just becomes an arms race really.

Patricia Thaine:

Cybersecurity is an arms race. [Debra: yeah that's fair]. Yeah, it's an internal arms race. So, what good has legislation done when it comes to stopping criminal activity, when it comes to creating viruses or to fraud and all of that? Very little, if any. So, expecting regulation to do anything on this front, if somebody's willing to create a rogue AI that's going to destroy humanity, I really don't think that they care about regulation.

Debra J Farber:

Right, right. Criminal is going to criminal and all. That makes a lot of sense. So, yeah, it is an arms race. I guess the question. . .Everything feels like an arms race already and we're not even dealing with the actual risks or we being like the general discussions you hear out in the media about AI or the LLM base model companies. They're trying to put all of this attention on the potential future, potential existential risks that maybe one day they might run into, as opposed to the current risks that they're doing with data right now - risks to people, risks to intellectual property, risks to, you know, just risks generally.

Debra J Farber:

I think that that's the major challenge, right? I mean, you've got this VC- backed company like OpenAI and then you've got Microsoft's money and you've got all the big tech companies that have lots of money and can spend a lot on compute and, you know, have lots of lawyers kind of racing to market right now with stuff that's half- done and not necessarily thoughtful around privacy (that's aimed at OpenAI, not the other company necessarily). Where do you think that this will go? I mean, we just had the EU kind of almost pass a law, but it seems like they're getting close to that. The U. S., in my opinion, is just going to take a lot longer than we want to pass something that's meaningful. I mean, look how long it took for the EU to do it. The U. S. only just started on this track. So, where do you see this going when we've got the big tech companies kind of in an arms race against each other to come and grab market share, given the current regulations down the road?

Patricia Thaine:

I think if we break it down into the problems - I mean there's bias, there's explainability, a question mark for whether or not hallucination is actually a problem or a feature, there's privacy - [Debra: Both]. Yeah, both works too. There's just, for a lot of these problems, not enough research as to how to even deal with them, explain abilities especially. Here's, I think, where regulation is going to make a huge difference, because what we saw with regards to the GDPR is that it is a forward- thinking, wishful regulation where a lot of the technology required to comply with the GDPR doesn't even exist yet.

Patricia Thaine:

We're working on it at Private AI, but it still does not exist yet, if you even think about. . .[Debra: data portability] data portability, being able to do access to information requests on unstructured data in a way that's not going to bog down an organization tremendously if they have enough customers. There's so many aspects of it that you need better technology for, but what we see is that technology being created as a response to that requirement. Because that's when people. . .actually organizations actually started to get their act together and the huge masses of data that they had. All of a sudden, they had to make some sort of sense of it and make it usable from a privacy understanding perspective. And then, what these AI legislations are going to bring is really

Patricia Thaine:

what do we want the future to look like? Where are we going to focus our energy? If we have things like, "don't even bother to do public surveillance because that is an unacceptable risk, that's great. That tells us, "o not focus your energy on that. Let's focus our energy on what the requirements are around explainability. Let's focus our energy around understanding what the risks are aroumd bias and data in insurance, for example, with regards to privacy." A lot of that is already covered by data protection regulations. So, the AI legislations I don't see actually making much of a difference when it comes to what happens with PII, except for unacceptable risks or requiring more oversight around these higher risk applications.

Debra J Farber:

Yeah, that makes a lot of sense. Well, I do hope that the standards that people are working on, which I'm not following as closely in the AI space, but I do hope other drivers move forward, putting the focus back on preventing harms, especially to people, with this new technology. A lot of the people coming to market seem to think that, oh, there's no AI laws, so there are no laws, so I can just do all of these experiments on people in real- time or put something in the stream of commerce without thinking about privacy and security. So, I do hope things like standards and new technologies or privacy enhancing technologies enable people to easily do some of the right things. So, at least there's an educational piece there, or some driver of them addressing that risk, before they put a product out in front of other people.

Patricia Thaine:

Yeah, definitely, and you had mentioned copyright risk at risk as well. I think, fundamentally, that's going to be an industry changing thing, if you think about Napster or now Spotify and what it did to the CDs in the music industry.

Debra J Farber:

Yeah, I remember.

Patricia Thaine:

Totally changed what's done. Yeah.

Debra J Farber:

Totally. Part of me is like I still think that there's going to be a reckoning here. I mean, you can't. . . if it's determined that people need to be compensated for the data that they provide for training purposes, or if there's infringement because the output of the AI model was obviously maybe over-trained or overfitted to somebody else's IP, that's going to be really expensive real fast. It's going to be really. . . I'm not even sure it's possible, if you're Open AI, for instance, to all of a sudden just change your model. You'd have to actually retrain everything. Right?

Patricia Thaine:

Yeah, and you'd need to understand what the sources of the data are that are actually being output. But, suppose that fundamental question of where did this come from gets answered in some way. Then, all of a sudden, that opens up authors and creators to a Spotify-like commercial possibility when it comes to their works.

Debra J Farber:

Absolutely. It's actually "dDon't cut human beings out of the loop here, you know, bring them into the loop and then, instead of the surveillance capitalism model where we're extracting stuff from people - like companies are extracting data from us - in this case, it's like enfranchise us. Let us be part of the economic model and don't take too much of it, company; you know, they're like 30% of profits on, you know, for the app stores. If you're just taking too much, people are not going to want to use your product. So, I think that makes a lot of sense. [Patricia: That's where regulation can help].

Debra J Farber:

I agree. Although, it's game- changing regulation, right? I mean, I personally, having gone to law school and just believing that copyright, specifically, is there to incentivize humans to put in the hard work of bringing something original and putting it out into the public domain. You know, I believe that that is integral to who we are. So, I actually think that eventually, this is going to really be an uproar in the industry, but I guess we shouldn't spend too much more time on that topic without any sort of regulation that backs me up on that.

Patricia Thaine:

I don't know if no regulation is backing you up on that, because if you think about what it normally takes for developing software on public repositories of code or on public data until very recently, the way to do it and the way we do it at Private AI is we make sure that any libraries that we're using - any GitHub libraries we're using or any code repositories are using - has the appropriate licenses. Any data we're using, we have to make sure we have the appropriate license to use. Unless it's public domain, we need to have the appropriate legal in place. So, it's actually a little bit odd the way that things are functioning more recently.

Debra J Farber:

Yeah, I know it's really odd and it's presumptuous and it just happens where no one was. . .only people who were living completely in an AI world, like yourself working on in this space. Maybe we're paying attention six months ago to a year ago, when Open AI, ChatGPT came out into everybody's consciousness and it's like, "hat is this?" Everyone's talking about it.

Patricia Thaine:

Yeah, and the way I was thinking about it was what's the difference between this and Google, which also makes use of people's data in order to profit? But it's really the providing the source and providing benefit to that source. The benefit they're providing is eyes on pages.

Debra J Farber:

It's a real good point. I think their argument was that they're linking to, they're indexing and providing the ability for somebody else to go to the actual site to see it. Right? Maybe that's not the case with a picture where it might be like Google images or whatever it is, and you see it on their site before you click on it. But a thumbnail image, I guess, would be kind of like a snippet, not the actual thing. I don't know.

Debra J Farber:

I actually don't follow copyright law development like the case law, so I really just pulling from my legal training back in 2004.

Debra J Farber:

This was a while ago, but yeah, it does feel like things really came out of nowhere. My true belief is that there was an effort in the industry, led by VCs, to kind of flood the market with this as quickly as possible so that it's everywhere and so regulators can't regulate or so that they beat the rest of the market. That's how I feel about how Open AI came into my consciousness, at least, when I first heard about it because I'm really drinking from all the fire hoses of privacy, data protection, ethical tech, and now I've added AI to that as well. But, that's only recently and it was definitely a lot more noise than anything else, like it trumped everything, all of the news in the space that I am following. I'm looking at the market level and go-to-market messaging and it really did feel like a play to dominate so quickly, get market share but also get it to the point where regulators just can't wrap their arms around it in time. But, I don't know, maybe that's just my jaded belief from my experience.

Patricia Thaine:

To a certain extent. But GPT-2, GPT-3 have been out for a while, and I think chat GPT was actually a bit of a fluke and a bit unexpected of how well it did in the public eye. I've experienced this as well in that building developer software, you think that, of course, the documentation is super important, how the API works is super important, but a good user interface actually changes people's perspective of it so quickly and so massively from "I don't get this to. I get this that I don't think necessarily that the developers building chat GPT knew what they were getting themselves into when they made it public.

Debra J Farber:

How fascinating is that, given that Sam Altman was the head of Y Combinator, right? He helps how many companies in the past come to market and to not know that you need to put all the areas of ethics, security, privacy, safety, right, that those are kind of essential to have.

Patricia Thaine:

I don't know if they knew what they were getting themselves into in terms of the massive amounts of public attention and the flooding of the market with regards to privacy and ethics and all that. I'm not sure how much thought necessarily was being put into that, specifically when it comes to chat GPT. But, I do have a friend who works on ethical applications within OpenAI, and she often tells me how she feels like her voice is heard, how they consider the ethical implications that she brings up quite seriously and they act on it. But, I think in some cases the problem is so massive and so much fundamental research has not been done and needs years in order to complete that you have to make a decision of whether or not to commercialize or whether or not to continue to pause commercialization after you spent this much money creating a system in order to get to these fundamental questions that may or may not have answers in the next five to 10 years.

Debra J Farber:

Yeah, that is definitely tension. I think it also just underscores the point that you really have to embed privacy and security and safety by design from the beginning in your architecture or the downstream wicked problems are going to be maybe too hard to surmount. We'll see, We'll find out.

Patricia Thaine:

Yeah, 100%. If you embed privacy in the very beginning, it'll save you a lot of headaches.

Debra J Farber:

Absolutely. Well. What's the best way that people can reach out to you to collaborate or request a demo?

Patricia Thaine:

You can try a demo out on our website at demo. private-ai. com. We also have chat. private-ai. com, which is our Private GPT, which provides a privacy layer before you're sending any data out to your large language model of choice and happy to connect with people on LinkedIn. So, if you listen to this podcast, please let me know in the message on LinkedIn that this is why you're connecting, because otherwise I will not know to connect with you because I don't know you.

Debra J Farber:

That makes a lot of sense and I will put all of the links and the article you referenced about AI regulation and all that in the Show Notes so that everyone could access them easily. Do you have any words of wisdom you want to leave the audience with before we close?

Patricia Thaine:

Thinking about the fundamentals gets you a good way on your way to ensuring privacy and security. There's this Microsoft report that came out recently that was talking about how, in cybersecurity, if you think about fundamentals like doing two-factor authentication, you are already 95% of the way there and can stop so many attacks. I'd say it's very similar with privacy. Think about the fundamentals and only really keep the data that you need. That's going to save you a whole lot of headache down the line.

Debra J Farber:

Well, thank you so much, Patricia, for joining us today on The Shifting Privacy Left Podcast. Until next Tuesday, everyone, when we'll be back with engaging content and another great guest. Thanks for joining us this week on Shifting Privacy Left. Make sure to visit our website, shiftingprivacyleft. com, where you can subscribe to updates so you'll never miss a show. While you're at it, if you found this episode valuable, go ahead and share it with a friend. And, if you're an engineer who cares passionately about privacy, check out Privado: the developer-friendly privacy platform and sponsor of this show. To learn more, go to provado. ai. Be sure to tune in next Tuesday for a new episode. Bye for now.

Why Patricia chose to co-found Private AI, the company's mission, and some key privacy-enabling features
How companies can avoid mishandling personal information when leveraging AI / machine learning
Why it is so difficult to discover personal information in the first place
Private AI's privacy-enabling architectural approach to finding personal data and preventing it from being used by or stored in an AI model
Private AI's extremely high matching rate, and how they aim for 99%+ accuracy
Private AI's roadmap & R&D efforts
The importance of licensing data sets to respect copyright and enfranchise consumers
How listeners can reach out to Patricia, collaborate, or access a demo
How thinking about the fundamentals gets you a good way on your way to ensuring privacy & security

Podcasts we love