S2E24: "Cloud-Native Privacy Engineering via DevPrivOps" with Elias Grünewald (TU Berlin) Artwork

The Shifting Privacy Left Podcast

Shifting Privacy Left features lively discussions on the need for organizations to embed privacy by design into the UX/UI, architecture, engineering / DevOps and the overall product development processes BEFORE code or products are ever shipped. Each Tuesday, we publish a new episode that features interviews with privacy engineers, technologists, researchers, ethicists, innovators, market makers, and industry thought leaders. We dive deeply into this subject and unpack the exciting elements of emerging technologies and tech stacks that are driving privacy innovation; strategies and tactics that win trust; privacy pitfalls to avoid; privacy tech issues ripped from the headlines; and other juicy topics of interest.

All Episodes

The Shifting Privacy Left Podcast

S2E24: "Cloud-Native Privacy Engineering via DevPrivOps" with Elias Grünewald (TU Berlin)

August 22, 2023 • Debra J Farber / Elias Grünewald • Season 2 • Episode 24

0:00 | 1:04:19

This week’s guest is Elias Grünewald, Privacy Engineering Research Associate at Technical University, Berlin, where he focuses on cloud-native privacy engineering, transparency, accountability, distributed systems, & privacy regulation.

In this conversation, we discuss the challenge of designing privacy into modern cloud architectures; how shifting left into DevPrivOps can embed privacy within agile development methods; how to blend privacy engineering & cloud engineering; the Hawk DevOps Framework; and what the Shared Responsibilities Model for cloud lacks.

Topics Covered:

Elias's courses at TU Berlin: "Programming Practical Privacy: Web-based Application Engineering & Data Management" & "Advanced Distributed Systems Prototyping: Cloud-native Privacy Engineering"
Elias' 2022 paper, "Cloud Native Privacy Engineering through DevPrivOps" - his approach, findings, and framework
The Shared Responsibilities Model for cloud and how to improve it to account for privacy goals
Defining DevPrivOps & how it works with agile development
How DevPrivOps can enable formal privacy-by-design (PbD) & default strategies
Elias' June 2023 paper, "Hawk: DevOps-Driven Transparency & Accountability in Cloud Native Systems," which helps data controllers align cloud-native DevOps with regulatory requirements for transparency & accountability
Engineering challenges when trying to determine the details of personal data processing when responding to access & deletion requests
A deep-dive into the Hawk 3-phase approach for implementing privacy into each DevOps phase: Hawk Release; Hawk Operate; & Hawk Monitor
How open sourced project, TOUCAN, is documenting conceptual best practices for corresponding phases in the SDLC, and a call for collaboration
How privacy engineers can convince their management to adopt a DevPrivOps approach

Read Elias' papers, talks, & projects:

Guest Info:

Connect with Elias on LinkedIn
Contact Elias at TU Berlin

Send a text

Privado.ai
Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.

Shifting Privacy Left Media
Where privacy engineers gather, share, & learn

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.

Elias Grünewald 0:49

Many times we have additional auxiliary services that process some fraction of the traffic to do analytic stuff or to just ensure the correct functionality of our system, and if personal data also gets into these services, then this is relevant from a privacy and regulatory perspective. And this is, of course, why we have to implement some good communication and inventory measures to have an overview about that all the time, also in light of the quick changes that could happen to a system.

Debra J Farber 1:27

Welcome everyone to Shifting Privacy Left. I'm your host and resident privacy guru, Debra J Farber. Today, I'm delighted to welcome my next guest, Elias , Privacy Engineering Research Associate at Technical University Berlin, where he focuses on cloud- native privacy engineering, transparency, and accountability in distributed systems and technology regulation. He also teaches undergraduate and graduate courses focused on privacy and cloud engineering. I love that he's leveraging technical expertise to teach practical privacy engineering concepts rather than only staying in the theoretical. Welcome, Elias. I can't wait to dive into the conversation today.

Elias Grünewald 2:13

Hi, Debra! Hi and everyone who's listening. I'm very happy to be here. I think this is my first actual podcast session, so I'm very excited about it.

Debra J Farber 2:22

Well, we got lucky then, because I think we're going to have fun today with some of the topics that we're going to dive into. If you wouldn't mind, just tell us a little bit about your research journey. How did you get interested in privacy engineering and DevPrivOps, especially with a focus on the cloud?

Elias Grünewald 2:42

Yeah, no problem. I started studying computer science basically as an undergraduate student or a bachelor's student in Berlin. Then I continued my studies also in Spain and Italy, and then I came back to Berlin. We had this one course right before our Bachelor of thesis, which was called 'Informatics and Society.' This was the first time my eyes basically opened once more during my studies, where all the things came together somehow. We learned about programming, about what a database is, what performance optimizations you can do in code (which are all great things and lovely to learn about that), but in this course, it was the first time I realized we can apply these technologies to do something better. There are so many societal problems or challenges and also regulatory frameworks that frame the work, how we as computer scientists do our way.

Elias Grünewald 3:36

Then, this was the first time I actually got in contact with the real research perspective on privacy and privacy engineering. Of course, I knew about privacy before that, but then my former lecturer, and now one of my colleagues and mentor, he introduced me to that field and it was a great pleasure to have that. Frank, if you hear that, thanks for that again. Then, I wrote my Bachelor thesis, and also Master thesis, with a specialization on cloud computing. Then, the point reached where I said, "Yeah, I'm now finished with my studies. I also worked part-time as a research student assistant at that time. My now supervisor, Ph. D supervisor, asked me, "o you have any topic or a direction where you want to go into for your PhD? Then it was quite a natural choice to then pick privacy and the cloud because I'm working at a distributed systems research group. Now, I'm one of the guys researching privacy engineering, and I'm deep- diving into that topic and I still love it, and try to bring as many people on this nice track that we now established at TU Berlin.

Debra J Farber 4:45

Amazing. There's so many opportunities, I think, to bring people in because, as you said, it is a societal challenge or it addresses societal challenges. I could see that luring someone like myself in if I were an applied technologist. You mentioned some of the coursework. You're teaching some really interesting courses in the Information Systems Engineering Department. Please tell us a little bit about them. The first one is "Programming practical privacy, web-based application engineering and data management. Then the second one is advanced distributed systems prototyping cloud-native privacy engineering. Tell us a little bit about both of those courses.

Elias Grünewald 5:26

Yeah, both of these courses are examples of courses that I taught over the last semesters. The first one, the programming practical, is an undergraduate course where students come together in teams of four or five people, usually where they are first confronted with a real problem. Let's say like this, they learn through their first four or five semesters all about programming, about distributed systems in a sense. Also, some of them start to take project management courses or something like that. In this programming practical, it's a course where you first do not get graded. So, there's no intrinsic motivation for the students to actually study for a final exam or something like that, but they are motivated and you, of course, introduce these topics to the students for solving real world problems. This is a nice challenge, also for us as instructors, because every semester we try to find topics that are related to something that happened recently or to something that is, of course, relevant for our research. We did that once in one iteration, definitely with a focus on privacy engineering. In that iteration we focused on two very relevant privacy principles. The first one was transparency and the other one was data access. For example, for data access or data subject access requests, the students then designed some nice dashboards, for example, to load some data that you could get from different services - real world services like Spotify, Facebook, Google, you name it. Then, usually from these data takeouts you just get a large zip file or a folder, which contains thousands of files of maybe JSON files or Excel as sheets or I don't know some type of data that my grandma for sure wouldn't be possible to analyze or to read, but also for her it's important to get to know what's in that data. So, my students then developed some nice approaches to actually visualize that data and to compare different data takeouts. And, then we could really see what a service knows about you; and the students both learned something about data privacy, but also about their project management skills and how to create such a real world project, because we were working with real data and, of course, then in this context. I think that's a nice way to teach the students by letting them also shape the whole project with this scope on privacy, but without that formal coursework of homework and a final exam and so on, but rather about them being more and more motivated over the weeks because they want to do something real, really cool in the end.

Elias Grünewald 8:06

The second one is a master's course - Advanced Distributed Systems Prototyping.

Elias Grünewald 8:11

This is something that we expect students to have a strong interest in our topic.

Elias Grünewald 8:16

As a Cloud and Distributed Systems Engineering Research Group, we of course teach also courses specialized on that, but in this Distributed Systems Prototyping course then they also come together in teams of usually seven students and they get a real- world project, usually together with a business partner or with an NGO or another institution that we have collaborations with, and then they solve the tasks on their own and usually also report to the project partner, the external project partner, and us as the instructors, once a week or with some presentations on that.

Elias Grünewald 8:52

The duration, which I called then Cloud Native Privacy Engineering was one where we worked together with an NGO which deals with mental illnesses or the prevention of mental illnesses of young students and pupils in school. This was in a young NGO developing a mobile app, which wants to collect, or had the plan to collect, some mood diaries or information about mental illnesses and contact information for people that can provide support in case of mental illnesses. And of course, these are very sensitive data that have to be stored securely. The data has to be processed in a transparent way. We have to think about data minimization aspects and so on, and that was once again a real world project, which then also led to some very nice research ideas for us, which are now continuing. So we are really living that teaching and research paradigm together, so that both can gain something from each other.

Debra J Farber 9:50

I love that. I love the gaining the practical experience because, again, I think it's this way your students can kind of more hit the ground running and deliver value if they end up in a privacy engineering role right out of university, especially when it comes to metrics. You know, metrics, having worked in this space for about 18 years has been one of the hardest things about privacy - to demonstrate that you have the right levers in place and you're doing all the right expected things and you're finding relevant data and surfacing it, and so what a great course, the Advanced Distributed Systems Prototyping.

Discussion of Elias' 2022 paper, "Cloud Native Privacy Engineering through DevPrivOps" - his approach, findings, and framework

Elias Grünewald 10:31

So, they ask us after these courses for writing their final thesis with us, or even they want to work in that domain, and then of course, we're very happy to advise them in that direction. And, it's right before they jump into their jobs. So, of course it's something that can coin their lives for some very long amount of time. So, it's very important to also guide them in that period.

Debra J Farber 10:54

It is. What a great opportunity! I think that's awesome. Most of the conversation we're going to have today, I think, is going to be around multiple papers that you've written or that you've published. Some of them have been co-authored, so I'll call that out as we're going through them, but the first one I want to talk about is about transparency in cloud native architectures.

Debra J Farber 11:16

This paper came out in March 2022 and it's called "Cloud Native Privacy Engineering through DevPrivOps, where you argue that the different dimensions of privacy engineering, like data minimization, transparency, security, etc. They must be factored into the development and operation cycles. Woo! Of course, I agree! It's totally that "shift privacy left mindset right, so obviously I was excited to see that you were publishing content like that. In the paper, you identify conceptual dimensions of cloud native privacy engineering by bringing together cloud computing fundamentals and privacy regulation; and, so you propose an integrative approach to be addressed that overcomes the shortcomings of existing privacy enhancing technologies in practice and then evaluating existing system designs. Could you please tell us a little bit about your approach and the findings, and I don't know if you're able to articulate this beautiful visual from the paper in here, but I will do my best to at least put it in the show notes or link to it.

Elias Grünewald 12:28

Yeah, of course. So, the paper was one of my first papers where I basically set out the plan and also the whole research direction of myself when I started my PhD, and so it's very broad position paper basically on what I, at this point, already realized during my research. Because there are so many privacy enhancing technologies out there and they are great and I learned a lot about designing different types of them and super great. But, at the same time as I was studied in that cloud direction and specialization during my studies, I was also many times wondering, "How are they then in practice aligned with the real world system designs that we have out there? Because many times we see some privacy enhancing technologies that could be applied to very stand- alone, monolithic system that is placed somewhere and operated by a single person or even a team, but that has some dedicated inputs and outputs and this all to be handled well. But then, you can apply a nice data minimization technique, for example, or a transparency measure, but this does not play well with the real- world of infrastructures and also development life cycles of modern cloud architectures. That was somehow the pain that I realized that developers have in that field, and so I decided to think about what are first perspectives and goals of privacy engineering in general, and that's fairly easy

Elias Grünewald 13:59

to at least list these basic principles: such as fairness, transparency, accountability, minimization, and so on. You can find them in the GDPR or other privacy regulations and scientific ground works, but also to factor in then the cloud basics. And, this concern, of course, infrastructure, platform, or application- level challenges or specifics of these different dimensions, but also, of course, the whole organizational structure of the institution or the company developing software and their internal processing processes, as well as external legislation and frameworks that apply to every software development project.

Elias Grünewald 14:41

And then, if you factor all of these dimensions together, you can come up with a matrix or some other visualization; and to just not forget about all these dimensions, because it's not surely not enough to implement a simple data minimization measure, for example, at the application level, when the infrastructure still leaks some information or has open ends to the sensitive data that you try to protect. And so, this was very much about combining the fundamentals of both privacy engineering and cloud engineering, and I think this is somewhat what guided me right after that, which we use many times also as a checklist or some guidance to check whether we have thought about all the possible problems that could arise - the different dimensions in a cloud setting.

Debra J Farber 15:30

In looking at this, this visual. So, on the left hand side it says: Legislation, Organization and Process for the regulatory aspects, and then Infrastructure, Platform and Application, and then you have all of the just dimensions of which we would look for in privacy. Right? Like, as you mentioned, Lawfulness, Fairness, Transparency, and so forth and so on. Have you come across a company that even thought about these overlapping dimensions or does this appear to be really novel when your students have been working with organizations, or when you have been, in order to write this paper.

Elias Grünewald 16:06

Yeah, of course, when we work together with business partners and our research institute or with partners in our projects, and also when we talk to our students or when we talk to companies, then of course there are single experts on all of these dimensions - so, usually security specialists, a very prime example of a well- researched domain.

Elias Grünewald 16:26

Of course, many open challenges still, but people know about the toolbox for solving application- level security problems or platform- level security problems.

Elias Grünewald 16:37

But, to find the people that actually know about the challenges of realizing a data subject access request (DSAR) or about how to ensure purpose limitation in a distributed system, then usually, if I asked for that in one of our larger courses, I don't know, maybe zero hands, maybe a few hands that then raise that they have even heard about the problem. And, I think that's already what illustrates that many people are not aware about the whole design space that we have to deal with as computer scientists, as legal specialists and so on and, of course, as everyone who wants to build their own business, because all of these combinations of challenges can, course, induce business threats and so on.

Elias Grünewald 17:23

And, yeah, many people are not aware about the intersection of many of these principles, and so we try to teach that and also come up, of course, with smart solutions for solving them; but, everyone is invited to do that and also to complete the picture with more examples like that. But also, this context realization helps to better describe different privacy enhancing technologies because we can then say, "If you use this in that tool or technique and combine it with something else, then we cover several of these dimensions", while if we call something a 'transparency- enhancing technology,' we still don't know whether it affects the platform or the application or something. I think that's then helpful to dissect what we are talking about.

Debra J Farber 18:10

Absolutely! So, where my brain went - because you're more applied technical than I am.

Debra J Farber 18:15

I was the Head Security, Privacy Assurance at AWS, but embedded within the Security Assurance team, so my affect was limited.

Debra J Farber 18:25

What I really learned and took away from my role there was how much the markets rely on The Shared Responsibilities Model for cloud right, where the contracts are written with (pretty much all the cloud companies now - all the major ones like Google and Amazon and the other major ones, because there are other ones - where you're basically saying that you know that the cloud provider is responsible for security of the cloud and privacy of the cloud, but whoever's putting personal data in it and, based on how you're using it, the organization that signs up is responsible for the privacy and security in the cloud. Right? And so, this delineated responsibilities from a contractual standpoint, which makes a lot of sense.

Debra J Farber 19:09

You need to delineate those, but the challenge is that this has been really about security. There's very little of the privacy part baked in, and what I would love to see is something like what you put together here baked into the Shared Responsibilities Model and really address privacy, because how can you address each of these elements if you don't spell them out in the contracting process, that somebody's responsible for all or part of these? What are your thoughts on that, if any before we dive into it?

Elias Grünewald 19:42

I agree, and it also inspires me to actually go into that direction or to talk about that more, because sometimes occasionally I read also some of these contractual or processing agreements where then people say, "yeah, there is end-to-end security measures or encryption or something like that, and usually in the list of technical organizational measures, a few ones are listed.

Debra J Farber 20:07

It's always security for privacy.

Defining DevPrivOps and how it works with agile development

Elias Grünewald 20:10

It is, of course, a problem that we have in that domain, and with that illustration or with that figure, we can, of course, also help people to understand that it's more than security and data minimization, of course, and it's not something to blame anyone, but rather for establishing trust and to agree on that this is necessary and relevant, because if we don't have any purpose limitation measure within our system, we still have the problem, or at least a chance, that we are not compliant with regulatory frameworks from different countries or member states or something like that, and then the security measures alone don't help us. So, I think that's a great foundation for talking about the whole thing, if you want to be honest about your system and the partners you're working with.

Debra J Farber 21:02

Excellent. Well, I'm glad I inspired you. I would love to see any fruit born from that idea, so feel free to share future papers with me. I know that you proposed a reference software development lifecycle called DevPrivOps to enhance established agile development methods with respect to privacy, so I'm definitely eager to hear more about this from you. How do you define DevPrivOps here and how does it work well with agile development?

Elias Grünewald 21:31

Yeah, of course. So if we are looking at that from a scientific perspective, but I think all the industry people that are listening right now would agree to that, that we have cloud engineering for 20 years already, and also agile development practices evolved over that time because the technology stacks became more and more complex and also the system architectures are inherently complex, and that's why people try to start to think about more structured ways about how to develop software in different teams that are then responsible for different components of the system, or, especially if we are talking about cloud-native solutions, so the most recent kind of software architectures where we have different microservices talking to each other just over APIs or through a message bus or something like that, with exchanging messages and many, many dozens or even hundreds of different components in a system where dozens or even hundreds of different teams - we are talking about the very large companies, online companies - work on together with different teams and do not want to follow, of course, this old approach of one team developing a piece of software and a whole other team of just operating that piece of software, which we many times called something like 'throwing a piece of software just over the wall' as a developer or a development team and then say, "Yeah, it's ops problem now that we have to operate that system, and that comes, of course, together if we think about the privacy domain or the whole compliance domain.

Elias Grünewald 23:10

It's not limited to privacy, but here we make the example of we have privacy measures and we have regulatory frameworks with very extensive descriptions of what a system has to ensure, and the larger the system gets, the more things you have to ensure. Especially in Europe very currently regulate very large online platforms. There's, of course, many things you have to think about, which is important and right that we have to do it, but then the development teams and the individual developers, many times not able to actually break that down to what they do in their daily business because what you learn, or what you're trained with, in a usual computer science curriculum or the training phase of the company is, of course, how you develop software and how you do that fast way and with fast-paced development life cycles, but not about privacy. And so, I put them together again is two things that basically existed before, but in their combination I think makes sense - at least I hope that that which is the basic DevOps cycle. So, I think many people have seen this figure of a number eight basically lying on its left, with eight different phases that are in this DevOps life cycle: so code, build, test, release, deploy, operate, monitor, and then plan your software development project again. This is a continuous cycle that then runs through the whole time a team is working on a software project, and it introduces some separation of concerns in that process and also enables quick development, because we don't want to go back to the time where we had waterfall software development, where you basically compile a big book of requirements, how the software should look like in the end, and then hoping that three years later or one year later the software is ready to use and also factors in all the regulatory requirements.

Elias Grünewald 25:10

We know that this doesn't work, and especially doesn't work with the fast-paced development environments and technological advances that we have today; and so we have to factor in these privacy problems also into how software is actually built in distributed teams, and we know that it's built more and more frequently using DevOps practices and the DevOps culture in general, and so we try to explain how you can do different privacy tasks within such a DevOps cycle, and this includes, of course, strategy discussions or task distribution and technology selections at the beginning of such a life cycle, where you also talk about which privacy enhancing technologies you want to employ in your software development process, but also introduces new challenges and also deals with the technological givens that we have in the face of the monitoring.

Elias Grünewald 26:05

We take this as an example where usually software developers are already doing tasks like logging, tracing, monitoring for solving reliability or fault tolerance tasks, and do that already because there are business needs for that or they are interested in the general performance of their system, where both of these aspects play together and we say that, also from a privacy standpoint,

Elias Grünewald 26:30

we can do and should do logging, tracing, monitoring, for example, because there are established tool chains that work well together with existing technological stacks and the cloud providers and the platform givens and so on. And, instead of coming with a privacy checklist mechanism or manual email communication with data protection officer within your company or something that is very off from the actual process, how the software is developed, we should focus more and more on privacy enhancing technologies that are built, established and used, especially in many of the different DevOps phases, such as the monitoring phases (I just explained it) because then we can come up with a fast-paced development cycle, as is, and also solid privacy solutions, and I think that just makes sense to not introduce overly complex processes next to the course of the developments, the lifecycle of the developers.

Debra J Farber 27:29

Yeah, I think that makes a lot of sense. Go with the processes that developers already know and understand, and so it doesn't feel like you're adding so much pressure on them to go and learn something new or do something out of their normal processes. So, it makes a lot of sense, and you might have already started to answer this, but can you tell us how DevPrivOps can enable formal privacy- by- design (PbD) and default strategies?

Elias Grünewald 27:54

So, privacy- by- design and by- default is, of course, about ensuring all the privacy principles that we mentioned before, and since the processes are so fast-paced for development life cycles.

Elias Grünewald 28:07

So, you probably know that from Amazon, but we also know that a lot of larger players that they ship components of their software multiple times a day.

Elias Grünewald 28:17

This is not how usually privacy processes work.

Elias Grünewald 28:20

So, at least what I got as a response from so many different companies or representatives of them to whom I talked, is that there is a Legal department and maybe even a Privacy department that has technological experts, but they work on a very different mode of operation, if you wanna say that, and usually there are some things like audit, security audit, but also privacy audits that happen maybe every half a year or something like that and come with a large paper trail, are interview- based or checklist- based or even Excel sheet- based, where you have to fill in, as a developer, what you did over the last period of time, be it half a year or even be it a week.

Discussion of Elias' June 2023 paper, "Hawk: DevOps-Driven Transparency & Accountability in Cloud Native Systems," which helps data controllers align cloud-native DevOps with regulatory requirements for transparency & accountability

Elias Grünewald 29:04

This already is so burdensome and so off the practice that developers want to work with or how they work actually; and so, I think it's better to align that with how the developers work because they can make so many errors, but of course, also shape how the whole thing is working and can make it better. So, we have to align with how the software is actually developed and also with how fast it is developed because if we employ the strategies that we employ for solving other distributed system quality related problems or challenges, we already have established mechanisms, like metrics, for example, that report back the current performance of our system and can do that on a millisecond basis or something. But, we don't have these established tool chains and reporting structures ready yet for privacy, but instead we have manual process usually and that cannot lead to a good overview of a large system inherently. So we have to do better and think about better solutions for that.

Debra J Farber 30:13

Thank you so much for that. That's really helpful. And now I want to turn our attention to the newly- published paper that you co-authored. It's called "Hawk (H-A-W-K, like the bird): DevOps Driven Transparency and Accountability in Cloud Native Systems, which was published this past June, so just about two months ago, where you outlined some of the challenges between regulatory requirements and DevOps- focused system engineering, and you propose a set of novel approaches that you refer to as the Hawk Framework that's explicitly tailored to specific phases of the DevOps lifecycle that are most relevant in matters of privacy- related transparency and accountability at runtime: so, release, operation, and monitoring. So, let's unpack this. First, can you shed light on why there is so much tension between GDPR requirements to maintain records of processing activities (otherwise known as RoPAs) and the DevOps- focused system engineering? How does this impact transparency and accountability?

Elias Grünewald 32:02

Yeah, sure. It's great that you mentioned that it came out just very recently. I presented it in Chicago at the IEEE International Conference on Cloud Engineering, and what I showed the audience there was basically how you do such a RoPA for a real- world system. And, the real- world system is actually the one that I mentioned earlier in our coursework, which was the app for dealing with personal data about mental illnesses of young students and pupils; and, what I then showed them is how usually a records of processing activities sheet is being done, and I took the example from the French data protection authority, the CNIL, and what this is is a large Excel sheet where you put in all the categories of personal data that are being processed, their purposes, storage limitations, access and deletion rights, and so on. So, all the different dimensions of transparency that you could have and, of course, also the accountability information that is relevant for demonstrating compliance, both within a company or an institution that develops a piece of software or a system, and, of course, also them reporting their information to data protection authorities (DPAs).

Elias Grünewald 33:18

And, if we compare the system architecture of justice system that my students built with the Excel sheet and the requirements from the GDPR (because we are here in Europe, of course, and dealing with European regulation), then this largely impacts how transparent and how accountable we can design this whole architecture, because there's so much information that is flowing through that system and the different data flows and different settings in which such a system could be deployed largely influences how such a RoPA could look like.

The challenges that engineers run into when they try to determine the details of personal data processing, as they're respond to access requests or deletion requests

Elias Grünewald 33:54

Much of the information also depends on the runtime, because if we scale such a system in a data center or across availability zones, which then affects multiple countries and introduces third- country transfers automatically without any specific decision that we put in there before, then many privacy- related activities do happen at runtime that we couldn't have seen before or only guessed about what could happen at a large load on the system, and that's why it's very important to come up with new transparency and accountability focused tools and general approaches of how we can solve these problems in a DevOps or a DevPrivOps fashion.

Elias Grünewald 34:39

So, what we wanted to do there is to dive deeper into the transparency and accountability principles of the GDPR and privacy engineering in general. This is only these two dimensions of the ones that I mentioned earlier, and it's still so complex to actually do that, if you're willing to do that in a real- world systems engineering context, and that's why we urgently have to come up with some best practices on that. And, that's what we are working on currently in that Hawk Framework, but of course, also beyond that.

Debra J Farber 35:11

Amazing. So, what are some of the specific challenges that engineers are running into when they try to determine the details of personal data processing, as they're responding to access requests or deletion requests?

Elias Grünewald 35:25

Yeah, so responding to access requests or deletion requests is only possible if we have transparency, full transparency about the whole system in place.

Elias Grünewald 35:35

And, if we're talking about cloud- native solutions, we then see multiple microservices, for example, interacting with each other.

Elias Grünewald 35:43

Every one of these microservices could have their individual database, which can follow different paradigms, so we can have an SQL store here, a key value store somewhere else, and also these services could process personal data, get it from the client or get it from third parties and share it with these third parties. And, at the same time, next to that inherent system complexity, we have distributed responsibilities. So, different engineers are responsible for development and operation of different services; and, and if we don't have a clear, transparent view of what is stored where (and I'm referring to storing here, because this is what, of course, is most important about data that is being stored for a longer term), but also about data in transit or data that just gets shared occasionally to third parties. If we don't have that transparency, we basically cannot solve that access request or deletion request. And, that is why, before we can even think about access or deletion, we have to have an up- to- date inventory of all the personal data that we have in our system or that is processed through the system, and that also includes all the different paths some data could go through.

Elias Grünewald 36:57

Of course, everyone thinks about the core functionalities of your system and about the user database or the shipping details, but database if we are talking about an e-commerce scenario or something like that. But, many times we have additional auxiliary services that process some fraction of the traffic to do analytics stuff or to just ensure the correct functionality of our system; and, if personal data also gets into these services, then this is relevant from a privacy and regulatory perspective and this is, of course, why we have to implement some good communication and inventory measures to have an overview about that all the time, also in light of the quick changes that could happen to a system.

Elias describes his approach to integrating privacy into 3 phases of DevOps: 1) Hawk Release; 2) Hawk Operate; & 3) Hawk Monitor

Debra J Farber 37:45

Yeah, that makes a lot of sense. I mean, I've seen examples in my career where some data has been checked into you know a particular repository or uploaded to a system that didn't account for personal data, so you didn't necessarily know how to find it again if you needed to present it for, you know, an access request or to delete it.

Elias Grünewald 38:09

To even add on that, there are new technologies developed over time that the law couldn't know of some years ago. When we talk about unstructured data stores, for example, where different kinds of data land in (so both personal data and very sensitive data of other kinds), but also very non-important data from a privacy perspective and to dissect what is actually stored there is a hard problem from a computer science perspective, of course, and that's also why we have to do that. Just to add on that.

Debra J Farber 38:42

Absolutely, absolutely. There's definitely discovery tools out there. I even sit on advisory boards for some companies that work on that. But, it's definitely from what you said, from a computer scientist perspective, you kind of want to plan for that, not just be like let's try to find discovery tools later to figure out what's personal data. Right? It just makes it more difficult, but that it is great to know that there's new technology out there. So, okay, let's get into the meat of this paper. Tell us about HAWK. This is your approach to help data controllers align cloud- native DevOps with regulatory requirements for transparency and accountability; and, walk us through your distinct approaches for implementing privacy into each DevOps phase. I'm going to state the phase - there's three of them) and give like a brief overview, if you can then go just dive a little deeper and tell us more about it. The first one is the release phase, which you call HAWK Release. Actually, I'm not going to describe it, I'm going to let you describe it. HAWK Release.

Elias Grünewald 39:38

Yeah, quickly commenting on the name HAWK, it's also a metaphor to having a bird's- eye- view on your whole system. That's why we call it HAWK. And then, the individual components relating to the DevOps phases, or DevPrivOps phases if you want to call it like that. For HAWK release, the basic challenge that we observe is the one that I mentioned already - it's fast deployments. So, we have many deployments every day, and with every new deployment of a new component we can potentially have new personal data processing activities. Or, of course, personal data processing activities could vanish in case we delete the service or something.

Elias Grünewald 40:20

The usual approach for deploying a new service to production is using different deployment strategies. We have out there for many years already approaches like A-B testing or blue-green deployments and so on, and what we look at from a privacy perspective is 'canary releases.' So, the general idea of canary release is that you put in a piece of software into the new system, the new version of a service for example, and only share a fraction of the current traffic that goes through the system. So, users make requests to the system and you say, "et 5% of the users now use that new version of the system and the other 95% still use the old version of the service, but with these 5% of users we can also, well we can already try out the new functionality to see if it works correctly. That's the usual approach. But, what we can also do is check, of course, privacy-relevant information in that deployment step or in that release step because if we observe that there is a new personal data processing activity happening because people send personal data to the system and you have established some transparency measures that observe that, then we can in such a release process check whether that is a processing activity that should be there and that it is already secured and that is written down in your records of processing activities and maybe even shared in your privacy policy and so on, and you can then very easily check if that is in there and if everything works correctly without affecting all the users of your current system and potentially opening some vulnerabilities or even harm your users directly because some data flows around services that shouldn't processes that kind of data.

Elias Grünewald 42:13

And for the concrete implementation, then we use some state-of-the-art technologies.

Elias Grünewald 42:18

Just to name them quickly, it's Flux's Flagger, Kustomize for Custom Kubernetes, templates. So, we are working on very recent technology to, again, align with the current stack of technologies that we have there. The basic idea is to have this deployment of a new service version with just a fraction of the users, and for that time where we deploy that new version for only a fraction of the traffic we collect metrics. And, these metrics then are, of course, privacy- related. So, we use another transparency tool that we developed, which is the Transparency Information Language and Toolkit (TILT), which allows us to document which kind of data is processed in a service; if we see that there are changes to our system that affect the overall privacy situation, then we can roll back that new version of the service or we can finish the release process and say, "yes, this new version is also compliant with what we want to have in the system. And, this is basically what Hawke Release is about. It's using the conventional release strategies, but apply it to the privacy domain, basically.

Debra J Farber 43:33

That's awesome. I love to see it. And the second DevOps phase is what you call Hawk Operate. Tell us about that.

Elias Grünewald 43:42

Yeah, Hawk Operate is basically about labeling and tracking personal data flows between running microservices or also traffic that comes from the outside or goes to the outside of the service with some dedicated transparency measures. So, what we first have to know is that usually in such microservice environments we have polyglot services. So, one service is written in Java, another one is written in Python or JavaScript, and all of them use different kinds of libraries to make HTTP requests or even remote procedure calls, or different kinds of communication mechanisms to talk to each other. And then, traditional or former privacy enhancing technologies tried to develop specific libraries or tools that were able to label some kinds of operations or methods within your programming language, with a dedicated library, for example. But, using that approach, we need libraries for many different languages because in such a complex architecture there could be many languages used, and then we have to develop some libraries that are compatible with each other and so on, and that's very tedious and so hard to learn for the engineers.

Elias Grünewald 44:57

So, what we thought about is using the service mesh paradigm, which is basically a architectural idea to let developers write their services as they do it in the usual way and make requests to other services; but, what the service mesh is then doing, it's hooking right into the communication between services and puts in there another level of abstraction to ensure new functionality just for the communication between different services. Usually, it is used for security measures or encrypting the traffic, for example between different services without employing the encryption and decryption library - both ends of the communication. But we can, of course, also use this for transparency and accountability tasks, because such a service mesh is then capable of easily tracking all the traffic that is flowing around in your system. And, if you hook into that - and we wrote an extension to one of the most famous service mesh implementations, which is Istio - and we wrote this extension, which is basically a plug-in that you install and then it's ready to use, which is able to then label personal data that is exchanged between services. This service mesh extension then realizes if that personal data flows around in your system, and it of course recognizes a structure of personal data that you labeled once, if it occurs multiple times in your system. And doing that, where you label your communication once and then can observe directly at runtime where personal data are actually exchanged, and you can also collect metrics about that that you know the email address of Debra is used in that system in 25% of the current cases and the first name of all the people using that system for a different percentage. And that's very helpful because then you can of course have more clear view about what is happening in that system, with a one-time labeling effort. Also, to just not present it as if it would be perfect -

Elias Grünewald 47:05

that, of course, has some limitations currently, which is the one-time labeling effort. I think one can live with that, of course, but as the technical people will know, service meshes also have some performance impact. That is clear. But, when we talk about transparency and accountability, we can also think about not having that approach on all the time, but only for the times in which we implement new pieces of software or run that regularly in times where the current load is not so high on the system, so that we don't run into overall performance problems, for example. But, we quantify that and if people want to use it or want to forever optimize that, then of course they are invited. And, we are more talking about the general approach of going away from individual libraries and augmentations to individual services, but rather looking at the interfaces of different services with that service mesh extension in Hawk Operate.

Debra J Farber 48:06

That's so helpful. That is really, really cool! And then, the last DevOps phase is monitoring. So, tell us about Hawk Monitor.

Elias Grünewald 48:17

Yeah. So, Hawk Monitor is basically connected to what I just said, because with the Hawk Operate component we basically collect the information that is flowing around and where personal data can be stored, and with Hawk Monitor we are actually trying to visualize, to aggregate that information, to also query for questions that you have as a Data Protection Authority (DPA) or a Data Protection Officer (DPO) or just a technical or legal expert that is dealing with the privacy challenges within your company. And, what we are doing there is basically aggregate all the information that we get from the Hawk Operate component, but we also have an API to input other kinds of data - think of Data Loss Prevention (DLP) systems or other transparency measures or security measures that are already out there to store that transparency and accountability information in one place; and then, on top of that, offer some query language and visualizations of that, which basically allows us to have a dashboard with all the relevant transparency information. And also, you could generate out of that at least part of a RoPA document or other legal documents that you have to provide. And, what is also interesting is that if we have these metrics from the operate component collected and then stored within the monitoring component, we can of course feed these kinds of information back into the release component that we talked about earlier, because if we have historic information about what happened in the system, we can compare that to new versions of services that come into a systems architecture and cannot only draw service graphs or something like that that have changed, but also actually quantify privacy-related information.

Elias describes how the Hawk framework can benefit regulators as well as data controllers

Elias Grünewald 50:07

I just mentioned personal data categories that might have changed or that I used for a certain fraction of the traffic that comes into the system, but of course this could also be applied to storage periods of personal data that underlies certain regulations that limit the overall storage of personal data. And, with that component HAWK Monitor, you can actually prove and demonstrate that you have established measures that frequently delete personal data from your system. Or, for example, if you can then show the historic information that last week the information about Elias was in the system or some certain records of a certain type were in the system and now they are not anymore. And, I think that's a useful tool. We propose this general approach again with also some frequently- used technical tools like Prometheus metrics and Grafana dashboards and parameterized queries and so on. But, it's more about the general idea that we have to collect runtime information and display it in a way that queryable, machine- readable and can then be transformed into the documents that we need in the end.

Debra J Farber 51:15

Yeah, that makes a lot of sense. It's almost going to look like magic when you do it right. I'm wondering have you shared this approach with regulators? Because obviously, your DPOs, your regulators - they're going to want accountability metrics; they're going to want to see proof of these things and with Hawk Monitor and with all the different DevOps phases put together, it sounds like you could really demonstrate that you're doing data protection compliantly and to the benefit of individuals. Right? You'll be able to assure that you're deleting data when you say you have it and you're going to delete it, right? Just curious if any regulators have weighed in on this approach.

Elias Grünewald 51:56

Yes, of course, we are also in close contact with data protection authorities and experts in that field.

Elias Grünewald 52:01

I presented some work earlier this year at the Privacy Symposium in Venice, which was a very political and both scientific event.

Elias Grünewald 52:09

But maybe, even more importantly, we presented some of our new DevOps-driven approaches at the CPDP Conference in Brussels in May.

Elias Grünewald 52:19

We organized a panel there, which you can check out on YouTube as well, where we also showed many of our new approaches. We don't want to force anyone to use this approach now, or we don't think that these approaches that we, as a very small research group, in the end developed are the silver bullet now, or also far from production-ready components, or something like that; but, we want to establish this new mode and this new way of doing it, and I think many of the regulators that we talked to are very interested in that because they are also overwhelmed by the different structures, the information that is relevant reaches them, or the variety of systems that they cannot audit because they are so difficult to understand. I think we can solve this then from two ends.

Elias Grünewald 53:12

First, of course, bottom- up: we, as companies or people working in companies, try to establish these tools for our own comfort and our safety and for establishing trust and so on, but also from the regulators to guide users or developers that are new in that sphere to certain belt-suited tools or our approaches in general that they can then implement in their systems. This includes, for example, the machine-readable representation of such information because that condenses one of the keys that allows us for scalable privacy in such a setting.

Debra J Farber 53:53

Gosh, that makes sense and that's really exciting. So, for this HAWK paper, can you sum up the general findings of the research?

Elias Grünewald 54:01

Yeah, sure. So, the general approach seems viable (and I think it's, first of all, a good message for everyone interested in DevOps and privacy tools), and we also think that the phases that we selected - so release, operate, and monitor - are belts-suited for doing these tasks. What we have not yet checked is, for example, what we could do in the testing phase, so even before deployment, but I think there are many startups there that are already working on that regarding continuous integration as well, and so on and all these tasks. So, I think these approaches could play well together, and we just opened up this space for new tools that we see, hopefully, in the near future for these three specific phases and what we summarize for each of these phases: the general challenges, our proposed approach and, of course, the limitations that come with our current implementation.

Elias discusses open source project: TOUCAN (which is funded by the German Federal Ministry of Education & Research). TOUCAN is creating conceptual best practices for corresponding phases in the SDLC

Elias Grünewald 55:01

So, I already mentioned some performance impact that, of course, differs from system to system and on the kinds of data that you have in there, and also, of course, on the complexity of the overall system because we were limiting our prototype to some very well-used technologies such as REST APIs, with JSON messages over REST that I exchanged, but of course, there are many more technologies in use that we currently have not covered with our framework and that are there in real-world system and that also have to be covered in future iterations of these. And, I think still the approach is viable and should be compatible with even more transparency- enhancing tools - for example, all the logging, tracing, monitoring and tools that I mentioned earlier - and with that, we could really come up with something powerful and also open. And, maybe also to mention that all the developments that we did here are open source. You can check them out on our GitHub. I think that's a very good way of doing transparency and accountability in the new way and bringing that forward.

Debra J Farber 56:15

That's awesome! I'll put a link to the CPDP talk, the YouTube link, as well as your GitHub link, if you go ahead and send me that right now. We're getting close to the end of the conversation, but I definitely wanted to bring up this dedicated project on transparency in cloud- native architectures that you're working on, called TOUCAN (like the bird). Does this build on the work you've done with HAWK. I know you have an open call for collaboration, so just tell us about TOUCAN and what you're looking for.

Elias Grünewald 56:46

Yeah, as you already realized, I have a passion for birds a little bit. And yes, TOUCAN is a project funded by the Federal Ministry of Research and Education here in Germany. I had the chance to hire currently four developers for me that build upon the HAWK project and also other projects that we have around here at our department, in which is a very dedicated research project regarding transparency and accountability, to bring that forward. What we want to do there is to, of course, research more phases of the DevOps lifecycle, but also to enable more and more interoperability between different existing transparency and talents and technologies and cloud platforms in general and how to do that. The open call for collaboration, which I can only repeat here and I would be very happy if anyone, in response to what I just explained, would be interested in setting up a call or a collaboration is that we at TU Berlin have several research projects about privacy engineering in general, also about anonymization and streaming environments, for example, as a completely different topic; but, for transparency and accountability, we have this dedicated project and are always looking for use cases and application areas, so different companies that we could involve or, of course, data protection authorities that are interested in our approaches. If we should summarize that or apply to a certain use case, you are very welcome. I would like to give just one example.

How privacy engineers can convince their Head of Engineering and management to adopt a DevPrivOps approach

Elias Grünewald 58:24

We have developed the Transparency Information Language and Toolkit (TILT), which I mentioned earlier, which is a machine- readable representation of transparency information - everything that has to be in a privacy policy, basically. We already have an open repository for many different companies, and if you want to be in that repository, then feel free to contact me or just open up a pull request on GitHub again. This is a great way to collaborate with us, and we are very open for follow- up projects, panel discussions, and so on. So, what we as researchers, of course, like to do to talk about our new ideas and to validate them in system context. So that's, of course, also something that we are looking into. I, as a Ph. D student, of course, do not have the access to multimillion dollar companies or even companies from many different domains, and if you're working in one that is dealing with these topics, then reach out and tell me why our approaches couldn't work in there in your settings, and then we can figure it out if we can do it better. That's, of course, the goal of the whole thing.

Debra J Farber 59:33

Awesome. That's great! What's the best way for people to reach out to you? Is it via LinkedIn or through your research email?

Elias Grünewald 59:41

Yes, via email works. LinkedIn works. Github works. X/Twitter, of course, also works. You can reach out.

Debra J Farber 59:47

It's not a bird anymore, though. It's X now instead of Twitter. All right, well, I will put all of that in the show notes - ways to contact you. So, what advice do you have for privacy engineers who want to shift left in their organizations via a DevPrivOps approach? Basically, how can they best convince their Head of Engineering and management to adopt a DevPrivOps approach?

Elias Grünewald 1:00:14

Yeah, I think if you want to convince your developers that you're working with, you tell them, "We can keep up the speed of development and operations", because everyone is proud of deploying a new piece of software or a new version of service to the production system and a developer that did something like a new feature or implemented some optimizations, they don't want to get annoyed with burden some paperwork or manual processes that just impact the speed of development. With DevPrivOps-driven or DevOps- driven transparency- enhancing technologies and accountability measures, we can keep up that speed and don't annoy developers, and at the same time, establish more and more trust within the organization of what developers that want to work on their own and that want to be responsible for a certain piece of software that they can actually do their thing and report through structured interfaces like ours. If we have that, everything works well and this is something promising, I think.

Debra J Farber 1:01:23

I agree. I'm really excited about it. I'm excited about your approach. The DevSecOps approach worked really well for security, so there's good reason to believe that if you invest in a DevPrivOps approach, then, you know, can really work well for privacy within orgs. Do you have any other pearls of wisdom that you'd like to share with the audience today before we close our conversation?

Elias Grünewald 1:01:46

I think that the hint that for everyone who is interested in that sphere - and I hope many of the people in the audience are now hooked into the topic of transparency and accountability as well because we have so many solutions for security and data minimization related tools and problems, of course, but not for transparency and accountability; and, I think that's something that I would like to talk about with you, and that is what the call for collaboration stands for. Also, as I saw in the list of episodes in this podcast, there is much information about that. So, that's great. If we create a community - a global community - on these topics, I think, if you all can engage in the conversation openly, then this is great for everyone.

Debra J Farber 1:02:33

I agree, more information- sharing is definitely going to help advance the ball faster when it comes to DevPrivOps, and so I think that's really great advice. This has been a wonderful conversation. I'm really excited by the work you've done, and I hope you get so many people reaching out to you to collaborate. I hope to have you on in the future as you do more research. You're spot on; You're working directly in the area where there's a lot of desire to learn more, I think, from privacy engineers and those engineers who want to get more privacy knowledgeable. So, you know, I look forward to having you on in the future.

Elias Grünewald 1:03:09

Thank you so much also for inviting me and to everyone who listened to that episode. Thank you very much.

Debra J Farber 1:03:14

Absolutely. Thank you so much for joining us today on Shifting Privacy Left to discuss your work with DevPrivOps. Until next Tuesday, everyone one will be back with engaging content and another great guest. Thanks for joining us this week on Shifting Privacy Left. Make sure to visit our website, shifting privacy left. com, where you can subscribe to update so you'll never miss a show. While you're at it, if you found this episode valuable, go ahead and share it with a friend; and, if you're an engineer who cares passionately about privacy, check out Privado: the developer- friendly privacy platform and sponsor of this show. To learn more, go to privado. ai. Be sure to tune in next Tuesday for a new episode. Bye for now.

Debra J Farber

Host

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

The Shifting Privacy Left Podcast