Origin Tracing and Data Attribution in Machine Learning

Thu 20Jun2024

From 11:00 until 12:30

At CAB H 52 (Seminar) + CNB/F/110 (Lunch) , ETH Zurich

CAB H 52 (Seminar) + CNB/F/110 (Lunch), ETH Zurich


This talk will discuss the challenges associated with tracing data and attributing it to the input or output of a model. Tracing of data to the input, referred to as membership inference, involves determining whether a specific data point was part of the model’s training set. Successfully attributing data to the training set of a model indicates information leakage, which poses significant privacy and security risks. Attribution of data to the output, commonly achieved through watermarking, involves identifying whether certain outputs are generated by a particular model. This form of attribution highlights the potential for model ownership verification and intellectual property protection. I will present novel statistical tests designed to perform these attributions and evaluate their effectiveness. Additionally, I will introduce methods to test the robustness of these statistical tests under varying scenarios, including changes in data distribution and adversarial conditions.

Join us in CAB H 52 (Seminar) + CNB/F/110 (Lunch).

Download Event to Calendar