In this conversation with Markus Lampinen, Co-founder and CEO at Prifina, a personal data platform, we discuss meaty topics like: Prifina’s approach to building privacy-respected apps for consumer wearable sensors; LLMs (Large Language Models) like Chat GPT; and why we should consider training our own personal AIs.
Markus shares his entrepreneurial journey in the privacy world and how he is “the biggest data nerd you’ll find.” It started with tracking his own data, like his eating habits, activity, sleep, and stress, an then he built his company around that interest. His curiosity about what you can glean from one's own data made him wonder how you could also improve your life or the lives of your customers with that data.
Thank you to our sponsor, Privado, the developer-friendly privacy platform
We discuss how to approach building a privacy-first platform to unlock the value and use of IOT / sensor data. It began with the concept of individual ownership: who should actually benefit from the data that we generate? Markus says it should be individuals themselves.
Prifina boasts a strong community of 30,000 developers who align around common interests - liberty, equality & data - and build and test prototypes that are gathering and utilizing the data working for individuals, as opposed to corporate entities. The aim is to empower individuals, companies & developers to build apps that re-purpose individuals' own sensor data to gain privacy-enabled insights.
Copyright © 2022 - 2023 Principled LLC. All rights reserved.
Debra Farber 0:00
Hello, I am Debra J. Farber. Welcome to The Shifting Privacy Left Podcast, where we talk about embedding privacy by design and default into the engineering function to prevent privacy harms to humans, and to prevent dystopia. Each week we'll bring you unique discussions with global privacy technologists and innovators working at the bleeding edge of privacy, research and emerging technologies, standards, business models and ecosystems.
Debra Farber 0:27
Today, I'm delighted to welcome my next guest, Markus Lampinen, CEO of Prifina, a personal data platform where the data is owned by individuals themselves, and anyone can build privacy-preserving services for people to live happier and healthy lives. Today, we're going to be discussing meaty topics like how to build privacy respected apps for consumer wearable sensors, large language models like ChatGPT, and why we should consider training our own personal AI's.
Debra Farber 1:03
Markus Lampinen 1:04
Thank you, Debra. It's awesome to be here. That sounds like a fantastic list we've got for the agenda.
Debra Farber 1:10
Yeah, I think it's pretty topical right now, and I love that you've been working on some of these overlapping challenges that are kind of coming to the fore right around AI, but also privacy. And, before we get into some of those meaty topics, I'd really love to kick off the conversation by first understanding how you got into privacy, and then co-founded a privacy tech company like Prifina. Can you tell us your origin story?
Markus Lampinen 1:35
No, 100%. Maybe first and foremost, for everybody listening, I'm very much a data geek. I mean, I'm the biggest data nerd that you'll find. So for me, it's really been kind of this realization that I've been tracking my own eating, my own activity, my own sleep, stress levels for a long time now. And one of the things that we realized was that with the increasing amounts of data that we have, it's also brought in this liability aspect of it. So, when I started thinking about the future; I thought, "Okay, it feels like privacy is almost...it's increasing the liability of utilizing data. And, as a result, a lot of companies are actually using less data, especially data that they don't understand or control or feel comfortable with." But then at the same time, that feels to be very much in a contrast to where we need to go. Like, if we think about what type of services and apps and you know, things we want to have in 10 years, 20 years, I mean, it seems obvious to me that we're going to need to use orders of magnitudes, or just ridiculous amounts, of more data than what we do right now. So, that's where essentially, we kind of got into looking at, "Okay, how do we navigate this? Like, how can we actually use a lot more data by then, you know, in a way, at the same time, completely preserving-privacy?" And I think a lot of what what we started doing Prifina was really just driven by that notion, that we actually utilize data ourselves a lot; but, we've been utilizing it with very poor tools. Like, you've been using, for example, spreadsheets to track different types of activity and correlate those yourself. And this is where we kind of wanted to tap into that, "Okay, how can we give individuals just better...a better ability to utilize this data, but then also utilize in a way that it actually has a meaningful impact because, you know, you can track data and feed your OCD as much as you want. But if you actually want to do something, then you have to have very, very clear uses of that." And that's where we started kind of building this company around that. What are the very, very clear things that you can do with your data yourself so that you can actually improve something such as, for example, the way that you sleep, the amount of deep sleep that you get?
Markus Lampinen 3:58
So that's very much me as a data geek coming into this, and privacy is absolutely key, but I would frame, essentially, a lot of what we've been doing that the ability to use more data in the future, it hinges on, I mean, in this very fundamental way that you have to respect privacy, and you have to have privacy embedded at the get go. Otherwise, especially with sensitive data sets like what we're talking about today, personal data, wearable data sensors, and different things like these, I mean, unless you have privacy embedded at the get go, you're just not going to be able to use that in a meaningful way going forward.
Debra Farber 4:36
So, one of the things that I love about Prifina is that you have a really clear focus on a particular industry set of privacy and development challenges. So in this case, that's helping IoT companies, particularly those who provide wearable sensors - it used to be called kind of the 'quantified self,' the movement - but I've not hearing people refer to it as much in that way anymore. But, you're bringing In the market privacy...the enablement of privacy respective apps. So, would you mind describing like the approach you took with building Prifina here and maybe flesh out some of the use cases to illustrate the problem you're solving for?
Markus Lampinen 5:16
Yeah. So if you start thinking about this very, very basic premise that you're right, 'quantified self' and the biohacking and all that's been around for a long time. But in the last five years or so, we started having all sorts of different wearable sensors, personal sensors, wearable devices, whatnot, that generate a lot of data. And this data is actually because it's measured from our bodies, then it is very, very...it's very sensitive. And it's also, if you start imagining that companies start collecting this, it also brings about a number of different types of issues. Like, if you have a ride sharing company tracking, for example, the driver's sleep, then I mean, yes, hypothetically, you can make safer rides; but, at the same time, why is, you know, why is ride-share company tracking where and how their drivers sleep?
Markus Lampinen 6:07
So part of what we started solving for was really that how do we get more utility out of this data. And it really started with this concept of individual ownership: that who should actually benefit from this data that we generate. And in our view, it should be us. It should be individuals themselves. And then if individuals gain value by sharing some parts of that with companies. then fantastic. You know, 100%, go for it. But if you start thinking about, for example, the type of data that we have...like I have a...I'm wearing an Oura ring, and I'm wearing an Apple watch, and I also have a smart scale in my bedroom. So, if you start thinking about okay, what can I deal with these right now, all of the data is in different types of silos. It's in different companies' own servers. But what I can do with Prifina is, I can collect this data from the smart scale, from the smart ring, from the smartwatch, and I can bring it into my own Prifina account.
Markus Lampinen 7:06
Now, what that then enables me to do is just run a very simple application on top of this. An example would be that the wearables that I wear, like the Oura ring and the Apple Watch, they both measure my deep sleep. So what the scale does, it actually measures temperature and humidity and things like these, which wouldn't be completely evident; but, if you look at the data that it actually does capture, then it does have those. So, you can very quickly create a hypothesis that okay, what temperature should your bedroom be so that you can maximize deep sleep? Or maybe your goal is to maximize REM sleep, I don't know. But in a way, by combining those, then you can create a sleep optimizer app. So that's a very, very simple example. But I really like this one because it takes different data points that we already have and it creates a new utility out of them that we didn't. And the reality is that for a lot of data sources and companies that are out there, then, for example, the smart rings, they typically are designed to help you optimize your sleep. But they they are tracking so many different types of things, and those data points, they can be utilized also for other types of applications. Like for example, most of them, they track your skin temperature. So, if you're able to do something like that, and you could look at, "Okay, what is the temperature change?" then you could, for example, create different types of preventative health applications even based on just that very, very simple alert that when something's he feels off with your temperature.
Markus Lampinen 8:43
So really, what we're trying to do is we're trying to do a two-fold shift. One is that you as an individual, you can collect all your data, and you can use it yourself to your own benefit - like for example, optimizing your deep sleep. But the second is that we also want to empower different companies, developers, to build those types of applications that take different combinations of data that the individuals might have, and essentially repurpose it in a new type of application. This includes, of course, also, those that manufacture the personal sensors, like the wearables, for example. They can use our framework. Anybody can; it's an open framework. But that being said, then a lot of that value prop is really around this benefit that if you can separate the actual application from the data, then you're making the data portable. And then we as individuals, we can decide that okay, what do we want to power with this data? And in our framework, you can use these types of applications and services that are built without actually sharing any data. So you can think about it, as I love this analogy of "bring your own beer" to "bring your own data." So in a way, what you're doing is you have the data, but the application is coming to you instead of you actually going, you know, outside of your own domain, so to speak with your data. So, that's kind of the starting point, but we really do see that, in the longer term, we can create this type of ecosystem around the individuals themselves. Or individuals themselves, they can use the data that they've already got in a simpler way to power different things that have positive outcomes for their own own benefit. And this is where we also see a lot of - what should I call it - a lot of innovation from the community. Because once you do get the data out and you do get to flow, then you can do things that just haven't been possible before. And sometimes it's very mundane. Like, for example, well, optimizing deep sleep is very important. But you know, for argument's sake, let's call it mundane. But then at the same time, some of those developer-led applications are also fascinating. Like, for example, can you optimize a Spotify playlist based on your stress levels? If you want to create a stress reducing Spotify playlist, can you actually quantify that? Can you actually, based on data, create one that would essentially achieve that goal? So that's what we're doing, really looking at how do we ourselves gain benefit from our own data?
Debra Farber 11:20
I think it's so exciting, and as privacy experts, people like myself get a little siloed into the work we're doing, and maybe not...that doesn't have to do with like developing actual apps or architecting anything. So, from what I've learned from you, I understand that you've got quite a large community of people who are innovating and playing around with what they can do with Prifina. Correct? How large is your community right now?
Markus Lampinen 11:49
Yeah, it's fascinating; and that's probably one of the most exciting things that we're seeing. But we've had over 30,000, developers interact with us and build and test and prototype various different types of things. And these are oftentimes, you know, similar type of gate data geeks to me, that once you actually have access to data, and you can build different types of things then, I mean, there's no shortage of different ideas. That's really an exciting thing to see that people are innovating new types of ways of utilizing data in an "individual-first" fashion. And the fact that it's individual-first, I mean, that's embedded into the platform itself. And it is, to be fair, one of the things why many developers and companies are drawn to us - that this notion of individuals owning and being essentially the point of, you know, the decider of what happens with their data, that's a huge thing. And then, you know, to a certain extent, that's also that there are many companies out there that really like the idea of utilizing data for some type of some type of application or service; but, several of them and you know, an increasing amount, don't actually want to keep the data. Because if you start collecting, for example, things like sensitive data on heart rates of individuals, and that can actually bring out a lot more issues that you need to manage down the line.
Markus Lampinen 13:13
But that community, it's a fascinating driver. And what we've been doing now recently is getting more into the the hardware side of things as well with sensors, where effectively what we can do is, it's not just the data that we have, but it's the data that we could have. So then essentially, you get into a different audience where it's not just software developers or data scientists, but it's different types of folks that they might have, let's say, an idea to take a certain type of sensor and place it into, I don't know, like the heel of their shoe or their, you know, endurance bike or whatnot. And then once you're able to do that, then you can actually create something quite novel. And the fun thing is that this is actually all relatively simple to do nowadays, you get the entire data infrastructure from Prifina. Individuals control it. Nobody has access to the data, not Google, Facebook, not us as Prifina - just the individual. But then after that, for example, taking a sensor off the shelf, I mean, you can get a sensor with like, you know, $10 - $20 bucks, and most of them, there's so many different types of sensors. So, you just place it, connect it, and then you've got a sensorized device.
Markus Lampinen 14:21
And this is something that we're starting to dabble with a lot more...that how can we connect more with the different things that we kind of interface with, and then kind of almost build this type of ecosystem of connected devices, as well as different data sources around ourselves, hopefully in a in a healthier way than, you know, what's going on, for example, with the Roombas of the world, that we don't necessarily exert the most control over. So, that community is really important to us, and I think that community is also something that we're looking at as the builders of the future, that when we get new types of applications, new types of values that individuals benefit from, if companies and developers can build those from the get go with privacy controls embedded, then that's probably the most exciting thing for us, because in many, many regards, that community is really leading the way in terms of where that utility comes from and what type of data sets we've looked at, but also continue to support going forward.
Debra Farber 15:29
Yeah, so that, you know, it's almost like, what you've done is you've built constraints into your platform that, you know, provide guardrails to the developers. Because that way, they don't have to know all the nuances and details of privacy and security. Kind of just build that in to the...so that they don't have to think about those things. You just kind of instead provide them with choices, which I think, you know, choices with how you want architect stuff. So, I think it's really meeting the developers' needs where they're at, and just kind of removing that heavy lifting, you know, removing that obstacle for them that they would otherwise look at privacy as a barrier to their innovation simply because they just aren't aware of what it is they need to do; and therefore, trying to understand all the requirements is that barrier as opposed to, you know, something like Prifina providing a...almost, I don't wanna say a safe sandbox because it's not a sandbox, obviously. But it's a...you know, you've architected it so that they don't have to think about it.
Markus Lampinen 16:27
And it's a really, really fair way of thinking about it because if you think about a lot of the work that that privacy professionals do, then a lot of it is after-the-fact. Like you've built something. You started to get users. You start to get lots of data, and then you're like, "Oh, crap, I'm sitting on this huge entire hive of valuable sensitive data, what do I do now?" And, a lot of what we wanted to do was really kind of avoid those "uh-oh" moments, that if you get data and you can build applications that generate, you know, utility for individuals, and individuals pay for that utility or service or whatever you're doing, once you get to that point that you actually start having a lot of users a lot of data, if you architected it right from the start, then I mean, you don't have that uh-oh moment. I mean, building a company, you have many different moments, but at least the data shouldn't become that type of a time bomb or hot potato. And one thing that we've noticed is that, in a way, nobody really thinks about privacy at the start. I mean, it's not front of mind for anybody.
Debra Farber 17:30
It is for me.
Markus Lampinen 17:32
And I think it's something that a lot of people, they really do start with absolutely fantastic intentions. Like they start thinking about, "Okay, I've got individuals that I can help sleep better and that's it; like the driver is I want to help them sleep better, and then after that, they start achieving that." And then, you know, once they've gotten to a certain stage, they get to that uh-oh moment. But if we sold that moment at the get-go, so that they don't actually end up holding the hot potato that there is no hot potato, then I mean, they might realize it only later, they might actually realize the value only later that, hey, actually, it's pretty cool that our individual customers don't this data that's helping them sleep better. But, that's where I really like that notion that they come for the data. They come for essentially building out those individually centered outcomes. And then, they realize later on, you know, step-by-step that, "Hey, actually, the architecture that I utilize is pretty cool." But, it is oftentimes that type of process where depending on the choices that you make on day one, then you you know, in day, 180, 360, whatever you end up in this type of position where you're like, "Okay, you know what has to happen now," and oftentimes, it is an after-the-fact type of realization that, "Hey, the choice that I made a year ago, they're now forcing me to do X, Y, and Z."
Markus Lampinen 18:55
And this is like...I kind of think about it in that way that the companies that do join us, then many of them realize that from the start. Many of them realize it on maybe a conceptual level, but they don't really, you know, they don't really learn what actually means in like a technical sense until later on. But, I think it's a very, very empowering thing where you see that they've actually got individuals gaining value from what they're doing, for example, sleeping better. And at the same time, you can kind of tell the story to the individual: that it is really you utilizing your data. It's really your data working for you as opposed to somebody else, you know, gathering your data, and then essentially, you know, utilizing it for you. So, I think it's also an interesting shift where you're absolutely right, that there are some of those defaults that are set into the platform, like everything is private and that, you know, if you did choose to share something, then you would have to authorize that. And, you're never sharing all of your data. You're sharing, you know, only certain snippets and so on and so forth. But at the same time, that is something that they buy into at the start. And that then avoids that type of a moment, then down the line, whenever that ah-ha moment would come.
Debra Farber 20:10
That's such a comprehensive answer. And you're really describing things in a way that's really tangible here, I think, for people learn. I think these are all in and of itself could be separate topics that we could have an entire podcast interview about. I want to unpack for instance the data portability aspect because very few people even talk about enabling portability of data. It's kind of like, "Oh, maybe one day will enable you to re...download all the data we took from you, and through DSARs and such, right, when you make an access request; but only under those conditions, but one day will allow you to make it portable to anywhere else if there's a standard or something. But what it looks like here is that you are standardizing it. I mean, I don't want to say that in terms of like an actual standard, but you are creating a repeatable way in which these companies that leverage sensor data can enable portability of personal data on an individual level. And so I think that's, honestly, it's kind of a breakthrough.
Markus Lampinen 21:08
And I think it's also one of these things that, you know, if you think about, for example, portability as it exists now, or even the data access requests, they're very, very clunky. There is no such thing that you pull your data from one thing and you plug it into another. Like, that doesn't exist; and I don't know if it will exist for some time to come. Because right now, what's happening is that, you know, you want to get your data from something and then you're typically downloading a CSV or some big archive or something. And who knows what to do with that? I mean, I can open up tables on my desktop, but then, you know, I'm looking at tables of data, and how does that benefit me? Creates a new new headache for me. But in a way, in all simplicity...I mean, you don't have to kind of, as a company or as a developer, you don't have to marry our model; but, if you support our model, then that means that you could create a...like, you could create your own app, your own iOS app, but you could also create a iOS app, and what that then means is that your individual customer, they can choose then do they use the the one where they're all their data is with you, or do they also use the one where all their data is with themselves.
Markus Lampinen 22:14
And this is what I see as even though we're not, we're not a GDPR or CCPA compliance company by any means, then in a way, what we're doing is you can allow the individual just to access and utilize their data and keep it with them. But portability to us also means that you should be able to leave us, too. And the way that we've solved this is that...what we're doing in the background is we're creating your own personal cloud environment, where at the start, when you start using Prifina, then, you know, we're providing it as a free thing for you until certain type of limits because if you start bringing in all of your data, then...obviously, we're not a data storage company. But that said, if at some point you said that, "Hey, Marcus, this was a great idea, but I'd really like, essentially, to sever this and just keep my data myself on my own harddrive." And what happens is, you effectively just put your own credit card details into your own cloud environment, and then after that, you're just managing it yourself.
Markus Lampinen 23:14
Now, this would be the extreme case of data portability; and I would imagine that in our case, it's mainly...for example, software developers that don't like essentially us managing, or they, you know, prefer to manage everything themselves, like all the nitty-gritty stuff of their cloud environment, then they can do that. But the data portability is fundamental; and I think, if I were to make a guess, in terms of how that goes in the industry overall, is that most likely it will be sold by something similar to what we're building. Hopefully, we are part of solving it as well, but I'm quite skeptical about these solutions when you're downloading data and then plugging it somewhere else because they're just...I mean, you're putting a lot of burden on the individual; and I think, for individuals to actually be able to take actionable control and generate utility out of their data, it has to be consumer-grade. And this is one of those things that we actually talk about a lot internally at Prifina - that most people aren't data geeks like I am, and most people haven't spent a lot of time thinking about all the data that they've got. So, for this to actually become something where they want to use it, and they want to power apps where they keep their data, it has to be consumer-grade. And portability is one part of it. but right now portability is nowhere near consumer-grade, but I would argue it's not even at the level that's even usable.
Debra Farber 24:39
I agree. It is definitely not usable, and it's kind of one of the dark secrets of data protection. So, you know, you've been talking about how you know, it's not enough to have just the data compliance solutions that are out there. That's exactly why I invited you on to this show, to talk about shifting left and some ways to architect for developers. And so, I just want to underscore the shifting left trend, and I think that is what's going to kind of push this industry forward - just more entrepreneurs kind of creating developer resources that make what they're trying to do easy and privacy-respective.
Debra Farber 25:18
Okay, so I want to turn to a really hot topic right now: "large language models" or LLMs, like OpenAI's ChatGPT. There are many others. Google's got one called Bert and there's many others in development. Although, open API's published ChatGPT, which is their prototype artificial intelligence chatbot, and it's earned them an enormous popularity among the general public. And just for those who don't know, "GPT" stands for "generative pre-trained transformer," and GPT3, which is different from ChatGPT is the largest language model known at the time with 175 billion parameters, trained on 570 gigabytes of text. LLMs like ChatGPT are able to respond to questions in the human-like manner and in great detail. I'm sure many of you are seeing, you know, people posting examples of what they, you know, from the queries that they've asked ChatGPT. An LLM can be defined as a deep learning algorithm that can recognize, summarize, translate, predict, and generate text and other content based on knowledge gained from massive datasets. Not only does ChatGPT provide users with links to browse through, but it also solves complex problems, and provides nuanced answers to inquiries. And, many believe that LLMs might actually have the potential to completely transform the way in which we use search engines. So, I've even heard people state that they feel, you know, Google might have an existential crisis about whether or not they'll have competition for search. So, now that I've given that background, can you outline for us like, what is it about LLMs that has technologists so excited?
Markus Lampinen 27:08
Yeah, that's a fantastic background, and as you were talking, I was thinking about like the chat GPT in particular, because that's been...I mean, that's boomed in the last month or so. I would also argue that we're probably at least like at the top of the first wave of the hype cycle; that we're now essentially shifting a lot of the attention away from just ChatGPT to other generative AI solutions, which is great because there are a number of other ones like you also mentioned. Maybe the way that I would think about GPT and this recent wave, is maybe it's not so much a technical breakthrough as it is a commercial breakthrough. Because in a way, there has been a lot of development in this domain. Like you had what, what was it? IBM's Watson that won Jeopardy, like, I don't know, nearly a decade ago now. So, there's been a lot of work that's been happening in this domain, but never has it been so visible, and that's really what what ChatGPT did. And, you know, OpenAI, they were remarkable in their success of pushing it out and really getting it to the stage that...I mean, it's almost like the new coming of the Twitter Fail Whale where, you know, it's so popular that everybody's on the site even though it doesn't work. That's something that I think is really, really remarkable.
Markus Lampinen 28:33
Now, why is it so appealing? Like why have people gotten so excited? I think it's really this first, real practical peek into how generative AI could actually have an impact. Like most of the stuff that you get from ChatGPT, I mean, it's not groundbreaking yet to be completely fair. And this is my own personal opinion, and that's something that others could debate; but most of the stuff, I mean, because it's trained on a huge language dataset, then I mean, it's only as good as the data that it was trained on. Like a lot of the stuff that you get, it's generic, it's general, and it's not really utilizable from the get-go.
Markus Lampinen 29:13
Now, it is also fascinating in terms of what it can do, because it's trained on a large textual data set, then it means that it can, for example, write sheet music. It can do haikus that rhyme properly. It can do, for example, one of my favorites is asking it to write depressing nursery rhymes and just see what type of creativity, you know, it comes out with. So, I think it's really kind of, I don't know, it's cemented this idea that, you know, it's not necessarily this version of ChatGPT, but it will be one of the ones that does actually create that existential crisis for Google. Because I mean, everything that I said about like ChatGPT being generic in general, and so on and so forth, I mean, that's true of search as well. Right? So, it's not necessarily the end all be all type of solution, but it certainly is a threat to search because it's just a different interface for something similar; a little bit more personal. But, I think what we're seeing is really that people are, by and large, maybe for the first time, really getting introduced to what direction we're going, and that's been an incredible thing to see because you start seeing different types of use cases, different types of applications, different types of potentials, and all of those, they will then spur even more innovation into those domains. And that's what's going to be a fantastic thing to see because if we can identify where generative AI can play a huge role, I mean, that's going to drive a lot more investment into that. And then hopefully, we'll also get into those types of positions where we're not getting general comments anymore, but we actually end up seeing what things are, in fact, usable. You just want actionable from that.
Markus Lampinen 30:59
Search has been around...I mean, just kind of anchoring on search, I mean, search has been around for a long time. So, this is also very interesting to think about, "Okay, what's the next thing," and it may be generative AI and maybe interacting with, you know, "search" through that type of interface? Maybe that is the next wave that we'll see for that type of a mass application?
Debra Farber 31:20
Are you able to describe for us the difference between like GPT-3 and ChatGPT? I kind of alluded to the definitions on top, but like, you know, practically, how does one talk about GPT3 in a different context from ChatGPT.
Markus Lampinen 31:37
Yeah, and this is an area where I'm definitely not an expert here. But ChatGPT as far as I understand it, it's just a different application layer for essentially the dataset. OpenAI has larger language models that it's using. I can't remember exactly which one trains or which one powers ChatGPT, but the biggest thing that they realized was that by giving it a very, very simple interface for humans to utilize, in this case, a chatbot, then that essentially allowed it to not just get stuck with its limitations, but almost enhance them. And what I mean by that is that it's text-based. So you know, it's gonna do really, really well with things that are textual. Like, for example, I mentioned sheet music, which is really, really great. Like, it's not going to paint you a picture. It's not going to give you sheet music, but it's going to give you, essentially, the codified versions. And it can do rhythm because it's trained on different types of things like music. It can do melody and all of that. So, ChatGPT is very much the application; and it's a very, very interesting and obviously very successful choice that they made for actually realizing and releasing it in that type of a domain where, essentially, the boundaries that it has don't actually end up being visible because everybody knows what a chatbot is. So, it's sort of like, you know, the next version of a chatbot. But, the actual AI engine that sits beneath it, then that's something that, obviously, ChatGPT is a prototype. So, what type of skins they can put on top of it on their AI engine, I mean, that's gonna be massive. It's not just going to be a chatbot. It can be basically anything going forward.
Debra Farber 33:24
That's important to keep in mind that this is just one step of innovation iteration; that there's more training; there's more testing and research and use cases being developed, as well as just more training. So, there's a lot that we'll need to keep on top of as this technology evolves. And that said, the benefits of LLMs can be enormous, and you know, as you've been saying, if we follow the "happy path" where everything works out as intended and all stakeholders are content; but, we know that that's a concern. So, can you describe for us some of the dangers of using LLMs, and especially harms to privacy, but you could expand beyond that?
Markus Lampinen 34:05
Yeah, and that happy path is a fantastic thought. Like, you know, if we only always stayed on that happy path, but that doesn't mean there's going to be everything of everything. But that's typically what happens. What's happening right now is that people are incredibly excited about ChatGPT and they're using it for everything, which you know, as a true prototype, you should be and that's really what OpenAI is benefiting from. At the same time, then we're having a lot of these types of discussions where, for example, take that sheet music example. It's trained on other people's creations, other people's sheet music. So, if I, for example, create my depressing nursery rhyme sheet music and it ends up becoming a huge success, which I hope that one doesn't, but let's say it does, then who owns that? Who owns the proceeds from that ? Who gets royalties? So, there's different things like these that are just not thought-through yet because you have, for example, something that like...melodies in the world, there are only so many and they overlap. So what was it trained on, and how do you do attribution across those?
Markus Lampinen 35:12
So, some of these things are, you know, maybe not directly privacy concerns, but, you know, they end up becoming very, very important moral concerns that we need to figure out and structure. And other things, too. Like you've had a lot of universities and colleges that have to deal with the fact that, you know, students are able to use ChatGPT to create MBA-sounding speak, which it seems to be very, very fluent and abundant in. And, that's also something that, you know...where did that come from and how do you deal with the rise of these types of technologies?
Markus Lampinen 35:46
Arguably, you can make the case that it was similar to search engines previously or Wikipedia - that, you know, people would pull all sorts of different things from them. But those things have evolved into having attribution and having the right footnotes and having the right disclosures in terms of what we're using overall. I think one concern, generally, is that as people get more and more excited about the use of ChatGPT, then, you know...OpenAI has been very specific, but it's a prototype and it's in beta, but I think people are just feeding so much stuff on that they're almost becoming reliant on it. And the Reliance itself isn't necessarily a concern. At least, I mean, everybody's free to do what they want. But, that said, if essentially, the expectation is that you're going to get something that's actually immediately usable, then I just don't think that we're there yet.
Markus Lampinen 36:42
Just as an example - the the example that I would give is that if you take something which is very specific and something that's very important, like, I don't know, somebody asks ChatGPT for weight loss advice or advice on mental health or what have you, if people are under the impression that it's actually an intelligent being of sorts that can answer those questions and not a equivalent to a search engine - and I mean, that's a very dangerous fallacy. So, I think there's a lot of education that also needs to happen in terms of what it is and what it's not, and it has a lot of potential, absolutely fantastic potential, as does the entire field of generative AI. But, here comes a human element that we humans have to also be very conscious of where we are along that cycle, just so that we don't over-leverage ourselves.
Markus Lampinen 37:35
And there's the other thing, which relates to what we're doing...I mean, it's based on a large language data set, which is fantastically large, but that being said, it's a general data set. So, if you're asking for, for example, advice on, I don't know, how not to feel so tired, I mean, it's based on generic stuff. So, it can come up with 10 different types of things, but at the end of the day, it's your responsibility yourself to make sure that you then take that and you critically take it forward and not just essentially, you know in the worst case, take something like that, copy /paste it and send it on to some other person, and then the other person thinks that "Okay, well, this is thought-through solid advice, which ChatGPT, for all intents and purposes, is just not yet. It's suggestions; it's examples, but it is very reliant on the human actually critically evaluating it and then figuring out, you know, what is the utility and what do I do with this.
Debra Farber 38:36
You make really excellent points. And, I do want to call out that where you were suggesting that humans could be very angry or they could be harmed, basically, psychologically by interacting with a chatbot thinking it was a human, that that actually is a harm of "decisional interference." If someone is making a decision to, you know, do something based on the conversation they're having - I mean, it could be something like, I don't know, as dire as not to commit suicide or something, you know, a medical intervention to anything, right. That's just what immediately came to mind. So, you know, that is definitely a potential privacy harm.
Debra Farber 39:13
And then also data...this has been trained on public data, which includes personal data in the EU's definition. So the EU under GDPR, doesn't exclude publicly available data from being personal. If it's attached to an identity and somebody says, "I'd like to have that data deleted." You've already trained the model, you know. What do you do? Is this an unethical use of training algorithms? Is this something like an FTC and regulators might say, you know, as a punishment, since it's almost like "fruit of a poisonous tree." You didn't ask permission and you don't have any of the protections in place for, you know, the inputs and the outputs, and from a privacy perspective, should we discourage any of the profits and make you throw out the model that you trained? It'll be interesting to see what regulators decide to do and how much harm they determine is being caused by the lack of thinking of privacy by design here.
Markus Lampinen 40:07
That's right. And, I think it's also...if we started thinking about the public data set that it's been built on (and again, I'm not an expert here. I haven't seen the dataset that it's actually been built on); but, if we think about public sources, and I know some of the sources that that they've gone through, I mean, what type of...I mean, let's start from the very basics. Like, what type of public content do we have online? A lot of this data and a lot of the data that these previous generative AIs have been built on, I mean, it's been pretty toxic to be completely honest. So, this is also where you're absolutely right, but there's some really tricky decisions that if somebody wants to have something deleted, and it's already ended up in training the model, then how do you trace something like that? Like, does the entire model get thrown out? But then, there's also other things that most of the Internet...I mean, there's a reason why a lot of regulators and a lot of folks in leadership positions are saying that "Hey, it's not healthy to have our kids on social media or on the public Internet, you know, before they're a certain age before they can actually understand them?" Well, on the flip side is it then essentially okay to train a model based on that if we're not letting our kids onto that thing. So, these are very, very tricky things.
Markus Lampinen 41:25
I'm optimistic because the OpenAI team and company, I mean, they have a fantastic amount of brain power and they've spent a lot of time working on these things. So, I'm optimistic that they are...a lot of the things that they've been, you know, they've been going through and that they've thought about; but, there are some very serious things that you do have to think about because the Internet itself, it's a big place and there's a lot of datasets. And it's not like...there's a saying that you know, the data data that you put in that it equals the value that you get out, or you know, if you have 'garbage in' then you get 'garbage out,' and it's not as linear in that sense. You can do a lot of things with the actual model itself. But, that said, it is a concern because what you're training the model with, then general information is great. That's where you can get a broad view, but I think it also calls into this question that...could for example, we at Prifina plug in personal data sets into this?
Markus Lampinen 42:28
Now, the biggest thing is that, of course, your personal data set, it should stay personal to you. It shouldn't be shared across the population. But, the idea that could all of us ourselves have our own personal AIs in a way, that's a really, really interesting thing that we see. It's also something that we've been working on for a long time now that when we essentially look at the evolution of the Internet, then we think it's obvious that we'll need to have our own personal assistants so to speak, but they should be trained on our data; they should be trained on it privately; and they should be trained on it in a way that they actually end up representing us. But, that's very hard to do. So, that's also something where I think it'll take awhile before we get there because even if you can complement the data with your own personal data, and you can come up with some type of a private personal AI, then getting to a general language model, I mean, that's a long way. Like, mine personally, I might be able to answer, you know, what type of shoe should I buy next? And, that would already be fantastic, but for it to actually give me life advice, then I think we're a little bit away from that still today.
Debra Farber 43:40
I love the personal AI concept, especially within the context of a platform that already has guardrails built in. So, it seems like you could do a lot less damage because you're, you know, putting those guardrails in. Do you have any advice to data scientists and researchers and developers regarding, you know, how to architect for ethical uses of LLMs? You know, can we ever make technology like ChatGPT mostly safe. And, I'm not including the personally-trained AI is here - more of this publicly-trained, large, you know...LLMs I guess can't be trained on a personal AI; it has to be a large dataset by definition.
Markus Lampinen 44:21
Yeah. And I think it's like your addition of like, "mostly safe." I think that's really where we need to get to. Nothing is going to be perfectly safe, unfortunately. So, the realization is that I think education in general. And this is not necessarily just the data scientists and the technical community, but just everybody by and large who's using ChatGPT, which is basically everybody now. So, it's also like an understanding of what it is. And then, I think it's also like...for a lot of people that are working with it, and I really liked these vertical use cases that are really focusing on very specific things. Like, for example, I mentioned that an AI-based shoe assistant. So, I'm a runner, and for me, I switch shoes every three - four months. And, if essentially I switch a shoe, then I mean, obviously, the personal assistant can recommend that, you know, it knows what my shoe size is and what type of shoes I've gone through, and so on and so forth. But you can even monitor that, you know, how does my stride and my walking and you know. So this is, like you said, that was the exclusion that he gave me and I broke that one, but on the personal AI, I think it's really, really interesting to find those.
Markus Lampinen 45:34
But even on the large language models, then testing them for vertical examples. Like, for example, looking at that, like if somebody is treating the personal assistant, as, you know, life coach, or you know, a dietician or whatever, then I mean, we need to do a lot of research in terms of what is it actually doing. And they've done a lot of restrictions into the the model itself, where it actually will tell you that this is off, you know, off limits, and I'm not going here. But at the same time, I mean, there are so many of them. So, that's essentially one of those areas where we need to have different vertical focus.
Markus Lampinen 46:08
And because many folks, they come into looking at large language models from a specific field or a specific use case or a specific interest, then I think we just need to have more of that exploration. This is completely natural, by the way, that when you have a big boom of especially big attention coming into something like this, it's completely natural that there's an education gap that we need to essentially catch up. And, we also need to experiment. So, I think this is one of those areas where I'm quite optimistic that with the amount of attention that has gone into this, then there will be a lot more attention and critical attention, both from individuals, but also then from, like you mentioned, regulators and different different agencies. But, I think especially for individuals, then just reading up on it yourself and just understanding what it is. I think it's really, really fascinating, but I think it's also really important.
Debra Farber 47:02
You know, I totally agree with you. I guess the challenge that I have is I'm less optimistic that we'll have ethical guardrails from, you know, regulators and laws in place, and you know, rules and all of that even standards, you know, for a long time. It takes time. This is so new; it felt like it just exploded onto the scene, and it's going to take time to get the general consensus from a cross-section of experts to get that stuff agreed upon. And so, my concern is who educates the general public? We have not seen great history of education across platforms. If anything I've seen platforms...like I'm just using, like, I don't know, cloud is an example. Where's the cloud privacy stuff? You know, I don't know, who trains on "cloud privacy?" It's just all security.
Debra Farber 47:50
What about like, mobile apps and the App Store and the different ways of doing that? I mean, it took many, many years before there were even like, some of the rules around how to do that safely and different methods that were taken by...an example, like Android has completely different app store philosophy, even forget about, you know, how it's architected - the Play Store versus Apple's App Store - and so when it came to ads, like nobody wanted to decide who had liability for stuff around the ads and what people see.
Debra Farber 48:21
And so what I'm concerned about is no one's going to take ownership on who needs to integrate the ethics. Is it OpenAI needs to...you know, has the moral responsibility to train people and educate those using their model? Is it the company's using that model who it's their job to educate downstream? I am not optimistic. And because the reason I do this work and trying to shift left is we haven't had people investing their money in privacy. In innovation, it was all focused on just compliance - governance, risk, compliance, legal instead of architecting privacy by design and default into platforms and services to begin with. So, just given that, like rant I just went on, do you have a perspective?
Markus Lampinen 49:07
I do. I mean, I'm with you. Nothing happens just by wishing it, wishing for it to happen. I think one of the opportunities that I see is, we've been really great at passing the buck down to the individual and just burdening the individual with a bunch of this stuff. Even the GDPR is like all the popups and so on and so forth. disclaimers and terms of service and whatnot. I think we have to be very real and just realize that people don't read any of that. They just click them away the quickest they can, and they don't even know which button they're clicking. I don't like that we're essentially passing all this to the individual; but, one thing that I I see as an opportunity is that if you start, for example, from the very basics that how many apps do all of us have on our phones? How many user accounts? How many sets of preferences? How many different data sets do we have across all the different things?
Markus Lampinen 50:01
One of the things that we like about Prifina is that if we can actually give you almost like your own super app and all your data, just one that you have that can then essentially manage all of these so that they don't get outdated. And then, guess what? Like you said, it comes down at the architectural level that it's just under your control and your ownership. I really like this opportunity, that maybe we can kind of create a simplified choice for the individual; that either you use it this way, where you're disclosing God knows what to God knows whom, or then you're using it this way, where you essentially keep it, and you get to choose. I think that's one of the things that I'm quite optimistic about in terms of the opportunity.
Markus Lampinen 50:48
Like, I don't know what probability I would put on it. But, in Silicon Valley, and just across the world, we've had a lot of focus on enterprise over the last 20 years. But, I'm quite excited about the fact that could data be one where we can actually really create a consumer market. And it seems like a completely outlandish thing to propose because data has been so enterprise driven, and it's been enterprises using this data. But, if you look at, for example, all the attention that is going into 'personal data,' like how much people are talking about, like regulators, policymakers are talking about, for example, in the EU, about how much data we have. I mean, they're already educating a lot of populations about the data that they've actually got. So if we can really take that and turn the choice that how can you use that? How can you actually make some something beneficial to you? That's the analogy from like the personal data.
Markus Lampinen 51:44
I think on largely language models and generative AI in general that if we can bring that in, those guardrails into some of these things at the ground floor, I mean, I think it has to be the developers in the companies that essentially introduces that for their individual customers, individual users. I'm like, you, I'm very skeptical, and like, you know, of the top down. I mean, if that does happen, that's gonna take forever, and it's just not going to be quick enough. But I think, especially with the generative AI, I mean, it's such a hot area that we really do need something to happen there before they can go mainstream. Like it is OpenAI's...ChatGPT is still in beta. So, one open question is that how...what does it take for actually it to go from beta to essentially mass market? Like, you know, at what stage do they stop calling it beta?
Debra Farber 52:41
Yeah, I agree. And, I want to say that normally, I would continue to be pessimistic after this conversation that the ethical obligations and training would be on purpose by OpenAI or other organizations that are in this space, even downstream from them, that they don't want to take on the burden. Right? Even if they have that sweet, sweet VC money and lots and lots of it. But, I guess I am re-evaluating that because Microsoft has put so much money behind ChatGPT, and Microsoft has had one of the largest...one of the best reputations of staffing privacy experts because they had some early snafus maybe like 20 years ago, back when privacy law was really starting to begin being enforced. As a result, they added a lot of privacy expertise and have continued to do so over the years. So, I'm hoping that there are people there that kind of help the organization move that needle, because we do pass the buck on to individuals. We have been doing that for years. This is the entire reason that consumers generally feel powerless. They just don't have the ability individually to make change, to feel like they can trust where their data is going, that they truly have choices about what's collected about them and how it's used. And that's just going to happen all over again, but to a deeper embedded degree that we can't undo. So, with that, I want to ask you one last question before we conclude and that's, you know, what are you working on right now? What are you trying to, like, you know, solve and do you have any sort of calls to action to the community?
Markus Lampinen 54:17
So, we have anybody listening that has, you know, a fantastic thing that they'd like to put a sensor on and geek out over, then I'm always very, very happy to chat. But then also equally, that a lot of what we're doing is taking the sensors and we're reimagining that data and an experience. So that's something where, you know, if we've got something that we could reimagine in terms of what it means to essentially buy a pair of shoes, like where do they come from or just also like this example of the motor sport sensor in a helmet, for example, then those are conversations that I absolutely love. And those are also such that we're going to be taking to market here in 2023 step-by-step on releasing more of those. So, we have an open community that anybody can, you know, they can join. They can hack around with different types of datasets and create really cool stuff, and that's fantastic. That's something that we really love to see. But then also, you can expect to see some really cool collaborations that we have coming up here, where we're hopefully pushing the boundaries a little bit with new types of datasets, and also that personal value from different types of sensors that we just haven't had yet before.
Debra Farber 55:31
That is just wonderful. So, I encourage everyone to, you know, join the community for Prifina. Is that in a Slack, or...?
Markus Lampinen 55:39
Yeah, we have this...we call it 'the community'...we have this umbrella term for the community, which is "Liberty. Equality. Data.," and it's one that we really like for the tone that it sets, that it's something where, you know, we're all working towards something greater. But you can find all of this from Prifina.com. And then, you know, you can find me on our Slack as well. So, I'm always, always happy to kind of chat and geek out, especially as it comes to questions around data.
Debra Farber 56:07
Well, I love that you really make your values front and center even in the name of the community group. Well, Marcus, thank you for joining us today on Shifting Privacy Left to discuss Prifina's approach to building privacy-respective apps for consumer wearable sensors; LLMs models like ChatGPT; and why we should consider training personal AIs.
Markus Lampinen 56:30
Thank you, Debra
Debra Farber 56:31
Sure thing, thanks for joining us today, everyone. Until next Tuesday, when we'll be back with engaging content and another great guest.
Debra Farber 56:41
Thanks for joining us this week on Shifting Privacy Left. Make sure to visit our website shiftingprivacyleft.com where you can subscribe to updates so you'll never miss a show. While you're at it, if you found this episode valuable, go ahead and share it with a friend. And, if you're an engineer who cares passionately about privacy, check out Privado: the developer-friendly privacy platform and sponsor of this show. To learn more, go to privado.ai. Be sure to tune in next Tuesday for a new episode. Bye for now.