AWS Clean Rooms Differential Privacy

A pioneering privacy-enhancing feature leveraging differential privacy, protecting individual user privacy while enabling data analysis in a collaborative environment. Offers a three-way trade-off mechanism allowing customers to balance privacy protection levels, data accuracy, and the number of permitted queries based on their specific needs.

Role: UX Designer II

Location: Amazon Web Services

Duration: September 2022 to May 2024

Tools: Figma, Usertesting, Quip

Background

AWS Clean Rooms is a clean room environment for collaborators to come together and share their data in a secure way. The two most important parties in a clean room are an advertisement publisher, who has data about their users, and an advertiser who pays the publisher to display their ads to their users. Unsurprisingly, the advertiser would want to analyze the publisher’s data for use cases like determining the reach, measuring effectiveness of their ads, etc. The environment serves as a neutral ground where advertisers and publishers share data for insights without compromising individual user confidentiality.

Our task was to introduce a robust privacy-enhancing feature that would uphold this balance, particularly focusing on differential privacy as the chosen solution.

 

What is Differential Privacy?

Differential privacy is mathematically backed framework that helps to strike a balance between data utility and individual privacy. It works by adding controlled noise to the data, ensuring that the output remains statistically consistent regardless of whether any individual's information is included or excluded from the dataset.

The key challenge with differential privacy lies in its technical complexity and the trade-offs involved. There is a three-way trade-off between privacy protection, data accuracy, and the number of queries that can be executed on the data. Achieving a higher level of privacy protection typically comes at the cost of reduced accuracy or a smaller number of permitted queries. The level of privacy protection is generally expressed as a ‘privacy budget’ in the industry, where in each query on a data consumes some budget depending on various parameters.


Design Process Overview

As the sole UX designer on the project, my role was to translate this intricate concept into an intuitive and accessible user experience. The initial hypothesis was that customers would want to protect their users' privacy in Clean Rooms with the latest mathematically backed solution. Design process is always messy. In a nutshell, it involved extensive user research, persona mapping, iterative prototyping while incorporating feedback from internal experts and external customers. This enabled us to go through the different launch milestones of beta to preview to making it generally available.


Challenges

In the early stages, I ideated through about 25 iterations of an end to end flow that could work. The science team’s approaches were ever evolving and so was our knowledge of what would a right approach look like.
One of the primary challenges throughout this project was the inherent complexity of differential privacy as a concept. Striking the right balance between technical accuracy and user-friendliness for the feature to be adoptable by all types of users.

Navigating the trade-offs involved in differential privacy is a significant cognitive load on the users. Balancing privacy protection, data accuracy, and the number of permitted queries was a delicate process, requiring careful consideration of their needs and industry best practices.


Defining the product and the user type

In the early stages, the product vision for the differential privacy feature in AWS Clean Rooms was to create an accessible and easy-to-use solution, targeting a broad range of customers, including those with limited expertise in privacy and differential privacy. The goal was to make this mathematically backed privacy solution approachable for novice users.

The target user type during this initial phase was primarily novice data engineers and professionals with minimal knowledge of differential privacy concepts. The strategy aimed to simplify the technical complexities, making it easier for novice users to adopt and benefit from this privacy-enhancing feature.

By catering to a wide audience, including novice users, the product vision sought to democratize the benefits of differential privacy and make it accessible to a broader range of organizations and individuals, regardless of their technical expertise.


Ideation and iteration

After 10 months of iteration and user testing, the team settled on exploring an approach where the user would not have to understand the technical details of differential privacy parameters. I pushed the team to attempt and wrap the complex differential privacy parameters into a more user-friendly concept, collaborating with writers and the science team to ensure scientific accuracy while maintaining comprehensibility for novice users. In this attempt to make the setup process more intuitive, the actual differential privacy parameters was translated into a user-friendly "Privacy Protection Factor" (PPF), akin to SPF in sunscreen - higher the factor, higher the protection.

Sample wireframes of initial set up and main configuration pages where ‘Additional privacy factor’ gets tuned.


I prototyped this iteration and conducted more user studies with internal engineers from various teams to gauge their understanding of the differential privacy setup pages from a novice's perspective. I reached out to internal privacy engineers through dedicated Slack channels to gather feedback from experts in the field.

All the feedback informed what we launched during the open beta.

In all, we tested the concepts with 13 internal people.


Beta launch

The beta launch of the differential privacy feature marked a significant milestone in our journey. In close collaboration with the Clean Rooms team, we successfully integrated the feature into their workflows. Working alongside the writers, I ensured that comprehensive help guides and informative panels were readily available for users.

As we launched the beta, our focus shifted towards collecting valuable feedback through regular customer calls. This allowed us to gain insights into what aspects of the feature were effective and what areas needed further improvement.

During this phase, it became evident that novice users found it challenging to grasp the nuances of the new "privacy protection factor" parameter and verify its scientific accuracy. Simultaneously, even expert users had to invest significant effort in understanding the intricacies of this simplified representation. Notably, our product stood out in comparison to competitors, who were not offering a similarly straightforward customer experience. We also discovered an intriguing trend in the domain of differential privacy: users preferred sticking to default values rather than making extensive tweaks. Their primary concern was having the stamp of differential privacy on their data protection layer, with the assurance of Amazon backing the default settings.

It underscored the need for continued refinement and education to ensure a seamless user experience, regardless of the user's expertise level.


Persona Matrix

Through these beta feedback, it became evident that the essence of differential privacy revolves around the three-way trade-off between accuracy, utility, and privacy. I reiterated the designs and worked closely with stakeholders to refine the product direction. There in lied the crux of our issue - we wanted to cater to ‘everyone’ whilst being scientifically accurate so that the privacy experts could easily recommend our product. We were realizing that we may be trying to teach calculus to middle school kids and that might not be the best strategy.

To help the product team to think about it more holistically and align on a solution to target the right audience, I utilized a persona matrix framework. This matrix helped us shift our positioning from catering to novice users towards serving users with at least a basic grasp of differential privacy.


The persona matrix identified four key user segments:

Each segment had distinct goals, pain points, desired features and user base in the beta customer pool as well as in the potential new customers. By mapping these characteristics, we could tailor the product experience to meet the specific needs of our target audience more effectively.

It became clear particularly from the user percentage that the novice users did make 25% of the base but mid-level privacy engineers made up about 35%, which gave us assurance to pivot to a strategy where we can trust the users to have at least a basic grasp of what differential privacy is.


Pivoted Iterations

Based on the persona matrix insights, we were able to get a very opinionated direction for our next iteration. We pivoted our approach from the beta iteration. Instead of oversimplifying the differential privacy parameters, we reverted to a more authentic representation.

The "Privacy Protection Factor" concept was replaced with the actual privacy budget parameter, a core element of differential privacy. The idea was that we appeal to the knowledge level of privacy engineers, and trust the novice users to grasp the concepts at a high level. Additionally, we introduced a three-way trade-off mechanism that allowed users to make decisions regarding the desired level of privacy protection, data accuracy, and the number of queries they could run on their data.


Iterated through several versions of the pivot where we tried adding different ways in which the user could tinker with the ‘utility’ to figure out what values could suit their use cases.



We also considered renaming the ‘Privacy factor’ briefly to ‘Differencing risk’ to pose the parameter as a risk parameter to help users make a cautious choice. We also explored adding in-context tutorials to help users understand the concepts.


After another few months of iteration on design and the backend science, we settled on a hybrid approach of what we launched in beta and some of the improvements from the new ideas. Below is what we launched in preview and for general availability later.


Final Design

Incorporating feedback from customer calls during the beta phase, we further refined the design for the public preview launch at re:Invent 2023. To aid user understanding, I advocated for the inclusion of an interactive example panel, enabling users to visualize in real-time how their settings affected query results.

The example panel remained a key component throughout all user tests. The live interactivity of the panel helped the users to play with sample query and see first hand how that affects the results. It helped them grasp the impact of their settings on data accuracy and privacy. We maintained a way for the user to go back to default settings after receiving assuring feedback on the same through our public preview phase.

I collaborated with different internal teams to set up a survey mechanism that captured user feedback along with specific settings the users utilized. This data helped us validate our default values and ensure alignment with different industry segment practices.

After close to two years of heavy collaboration and rigorous testing, we launched for general availability in May 2024 🎉

Post-launch highlights

The successful launch of differential privacy in AWS Clean Rooms has yielded significant impact and recognition:

  1. AWS Clean Rooms was named a Leader in data clean room technology by the IDC MarketScape's Worldwide Data Clean Room Technology 2024 Vendor Assessment, specifically highlighting its differential privacy features.

  2. Organically acquired customers, reported the ability to conduct end-to-end testing of the functionality without requiring support from AWS, and a CSAT score of 92% confirming the ease of use.

  3. Competitors like Google Cloud and Snowflake have also introduced differential privacy into their data clean room offerings. However, their solutions require deep expertise in differential privacy and place a greater burden on customers to understand the accuracy of query results. In contrast, AWS Clean Rooms' fully managed offering positions it as a more accessible and preferred choice for customers.