-
On Stealing Graph Neural Network Models
Marcin Podhajski, Jan Dubiński, Franziska Boenisch, Adam Dziedzic, Agnieszka Pregowska, Tomasz Paweł Michalak
The Fortieth AAAI Conference on Artificial Intelligence
Current graph neural network (GNN) model-stealing methods rely heavily on queries to the victim model, assuming no hard query limits. However, in reality, the number of allowed queries can be severely limited. In this paper, we demonstrate how an adversary can extract the GNN with very limited interactions with the model. Our approach first enables the adversary to obtain the model backbone without making direct queries to the victim model and then to strategically utilize a fixed query limit to extract the most informative data.
@inproceedings{podhajski2026Stealing,
title={On Stealing Graph Neural Network Models},
author={Marcin Podhajski and Jan Dubiński and Franziska Boenisch and Adam Dziedzic and Agnieszka Pregowska and Tomasz Paweł Michalak},
booktitle={The Fortieth AAAI Conference on Artificial Intelligence},
year={2026}
}
-
Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images
Aditya Kumar, Tom Blanchard, Adam Dziedzic, Franziska Boenisch
The Fortieth AAAI Conference on Artificial Intelligence AI Alignment Track
State-of-the-art Diffusion Models (DMs) produce highly realistic images. While prior work has successfully mitigated Not Safe For Work (NSFW) content in the visual domain, we identify a novel threat: the generation of NSFW text embedded within images. This includes offensive language, such as insults, racial slurs, and sexually explicit terms, posing significant risks to users. We show that all state-of-the-art DMs (e.g., SD3, SDXL, Flux, DeepFloyd IF) are vulnerable to this issue. Through extensive experiments, we demonstrate that existing mitigation techniques, effective for visual content, fail to prevent harmful text generation while substantially degrading benign text generation. As an initial step toward addressing this threat, we introduce a novel fine-tuning strategy that targets only the text-generation layers in DMs. Therefore, we construct a safety fine-tuning dataset by pairing each NSFW prompt with two images: one with the NSFW term, and another where that term is replaced with a carefully crafted benign alternative while leaving the image unchanged otherwise. By training on this dataset, the model learns to avoid generating harmful text while preserving benign content and overall image quality. Finally, to advance research in the area, we release ToxicBench, an open-source benchmark for evaluating NSFW text generation in images. It includes our curated fine-tuning dataset, a set of harmful prompts, new evaluation metrics, and a pipeline that assesses both NSFW-ness and text and image quality. Our benchmark aims to guide future efforts in mitigating NSFW text generation in text-to-image models, thereby contributing to their safe deployment.
@inproceedings{kumar2026beautiful,
title={Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images},
author={Aditya Kumar and Tom Blanchard and Adam Dziedzic and Franziska Boenisch },
booktitle={The Fortieth AAAI Conference on Artificial Intelligence AI Alignment Track},
year={2026}
}
-
Demystifying Foreground-Background Memorization in Diffusion Models
Jimmy Z. Di, Yiwei Lu, Yaoliang Yu, Gautam Kamath, Adam Dziedzic, Franziska Boenisch
The Fortieth AAAI Conference on Artificial Intelligence
Diffusion models (DMs) memorize training images and can reproduce near-duplicates during generation. Current detection methods identify verbatim memorization but fail to capture two critical aspects: quantifying partial memorization occurring in small image regions, and memorization patterns beyond specific prompt-image pairs. To address these limitations, we propose Foreground Background Memorization (FB-Mem), a novel segmentation-based metric that classifies and quantifies memorized regions within generated images. Our method reveals that memorization is more pervasive than previously understood: (1) individual generations from single prompts may be linked to clusters of similar training images, revealing complex memorization patterns that extend beyond one-to-one correspondences; and (2) existing modellevel mitigation methods, such as neuron deactivation and pruning, fail to eliminate local memorization, which persists particularly in foreground regions. Our work establishes an effective framework for measuring memorization in diffusion models, demonstrates the inadequacy of current mitigation approaches, and proposes a stronger mitigation method using a clustering approach.
@inproceedings{di2026demystifying,
title={Demystifying Foreground-Background Memorization in Diffusion Models},
author={Jimmy Z. Di and Yiwei Lu and Yaoliang Yu and Gautam Kamath and Adam Dziedzic and Franziska Boenisch},
booktitle={The Fortieth AAAI Conference on Artificial Intelligence},
year={2026}
}
-
Data Provenance for Image Auto-Regressive Generation
Bihe Zhao, Louis Kerner, Michel Meintz, Tameem Bakr, Franziska Boenisch, Adam Dziedzic
The Fourteenth International Conference on Learning Representations
Image autoregressive models (IARs) have recently demonstrated remarkable capabilities in visual content generation, achieving photorealistic quality and rapid synthesis through the next-token prediction paradigm adapted from large language models. As these models become widely accessible, robust data provenance is required to reliably trace IAR-generated images to the source model that synthesized them. This is critical to prevent the spread of misinformation, detect fraud, and attribute harmful content. We find that although IAR-generated images often appear visually identical to real images, their generation process introduces characteristic patterns in their outputs, which serves as a reliable provenance signal for the generated images. Leveraging this, we present a post-hoc framework that enables the robust detection of such patterns for provenance tracing. Notably, our framework does not require modifications of the generative process or outputs. Thereby, it is applicable in contexts where prior watermarking methods cannot be used, such as for generated content that is already published without additional marks and for models that do not integrate watermarking. We demonstrate the effectiveness of our approach across a wide range of IARs, highlighting its high potential for robust data provenance tracing in autoregressive image generation.
@inproceedings{zhao2026data,
title={Data Provenance for Image Auto-Regressive Generation},
author={Bihe Zhao and Louis Kerner and Michel Meintz and Tameem Bakr and Franziska Boenisch and Adam Dziedzic},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}
-
Natural Identifiers for Privacy and Data Audits in Large Language Models
Lorenzo Rossi, Bart{\l}omiej Marek, Franziska Boenisch, Adam Dziedzic
The Fourteenth International Conference on Learning Representations
Assessing the privacy of large language models (LLMs) presents significant challenges. In particular, most existing methods for auditing differential privacy require the insertion of specially crafted canary data during training, making them impractical for auditing already-trained models without costly retraining. Additionally, dataset inference, which audits whether a suspect dataset was used to train a model, is infeasible without access to a private non-member held-out dataset. Yet, such held-out datasets are often unavailable or difficult to construct for real-world cases since they have to be from the same distribution (IID) as the suspect data. These limitations severely hinder the ability to conduct scalable, post-hoc audits. To enable such audits, this work introduces natural identifiers (NIDs) as a novel solution to the above-mentioned challenges. NIDs are structured random strings, such as cryptographic hashes and shortened URLs, naturally occurring in common LLM training datasets. Their format enables the generation of unlimited additional random strings from the same distribution, which can act as alternative canaries for audits and as same-distribution held-out data for dataset inference. Our evaluation highlights that indeed, using NIDs, we can facilitate post-hoc differential privacy auditing without any retraining and enable dataset inference for any suspect dataset containing NIDs without the need for a private non-member held-out dataset.
@inproceedings{rossi2026natural,
title={Natural Identifiers for Privacy and Data Audits in Large Language Models},
author={Lorenzo Rossi and Bart{\l}omiej Marek and Franziska Boenisch and Adam Dziedzic},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}
-
{SERUM}: Simple, Efficient, Robust, and Unifying Marking for Diffusion-based Image Generation
Jan Kociszewski, Hubert Jastrz{\k{e}}bski, Tymoteusz St{\k{e}}pkowski, Filip Manijak, Krzysztof Rojek, Franziska Boenisch, Adam Dziedzic
The Fourteenth International Conference on Learning Representations
We propose SERUM: an intriguingly simple yet highly effective method for marking images generated by diffusion models (DMs). We only add a unique watermark noise to the initial diffusion generation noise and train a lightweight detector to identify watermarked images, simplifying and unifying the strengths of prior approaches. SERUM provides robustness against any image augmentations or watermark removal attacks and is extremely efficient, all while maintaining negligible impact on image quality. In contrast to prior approaches, which are often only resilient to limited perturbations and incur significant training, injection, and detection costs, our SERUM achieves remarkable performance, with the highest true positive rate (TPR) at a 1% false positive rate (FPR) in most scenarios, along with fast injection and detection and low detector training overhead. Its decoupled architecture also seamlessly supports multiple users by embedding individualized watermarks with little interference between the marks. Overall, our method provides a practical solution to mark outputs from DMs and to reliably distinguish generated from natural images.
@inproceedings{kociszewski2026serum,
title={{SERUM}: Simple, Efficient, Robust, and Unifying Marking for Diffusion-based Image Generation},
author={Jan Kociszewski and Hubert Jastrz{\k{e}}bski and Tymoteusz St{\k{e}}pkowski and Filip Manijak and Krzysztof Rojek and Franziska Boenisch and Adam Dziedzic},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}
-
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Bart{\l}omiej Marek, Lorenzo Rossi, Vincent Hanke, Xun Wang, Michael Backes, Franziska Boenisch, Adam Dziedzic
The Fourteenth International Conference on Learning Representations
Image autoregressive models (IARs) have recently demonstrated remarkable capabilities in visual content generation, achieving photorealistic quality and rapid synthesis through the next-token prediction paradigm adapted from large language models. As these models become widely accessible, robust data provenance is required to reliably trace IAR-generated images to the source model that synthesized them. This is critical to prevent the spread of misinformation, detect fraud, and attribute harmful content. We find that although IAR-generated images often appear visually identical to real images, their generation process introduces characteristic patterns in their outputs, which serves as a reliable provenance signal for the generated images. Leveraging this, we present a post-hoc framework that enables the robust detection of such patterns for provenance tracing. Notably, our framework does not require modifications of the generative process or outputs. Thereby, it is applicable in contexts where prior watermarking methods cannot be used, such as for generated content that is already published without additional marks and for models that do not integrate watermarking. We demonstrate the effectiveness of our approach across a wide range of IARs, highlighting its high potential for robust data provenance tracing in autoregressive image generation.
@inproceedings{marek2026benchmarking,
title={Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models},
author={Bart{\l}omiej Marek and Lorenzo Rossi and Vincent Hanke and Xun Wang and Michael Backes and Franziska Boenisch and Adam Dziedzic},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}
-
Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning
Dariush Wahdany, Matthew Jagielski, Adam Dziedzic, Franziska Boenisch
The Fourteenth International Conference on Learning Representations
In machine learning, curation is used to select the most valuable data for improving both model accuracy and computational efficiency. Recently, curation has also been explored as a solution for private machine learning: rather than training directly on sensitive data, which is known to leak information through model predictions, the private data is used only to guide the selection of useful public data. The resulting model is then trained solely on curated public data. It is tempting to assume that such a model is privacy-preserving because it has never seen the private data. Yet, we show that without further protection, curation pipelines can still leak private information. Specifically, we introduce novel attacks against popular curation methods, targeting every major step: the computation of curation scores, the selection of the curated subset, and the final trained model. We demonstrate that each stage reveals information about the private dataset and that even models trained exclusively on curated public data leak membership information about the private data that guided curation. These findings highlight the previously overlooked inherent privacy risks of data curation and show that privacy assessment must extend beyond the training procedure to include the data selection process. Our differentially private adaptations of curation methods effectively mitigate leakage, indicating that formal privacy guarantees for curation are a promising direction.
@inproceedings{wahdany2026curation,
title={Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning},
author={Dariush Wahdany and Matthew Jagielski and Adam Dziedzic and Franziska Boenisch},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}
-
Finding Do{RI}: Discovery of Retained Images in Diffusion Models
Antoni Kowalczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch
Forty-third International Conference on Machine Learning
Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering verbatim training data replication, based on the assumption that memorization can be localized. We challenge this assumption and demonstrate that, even after such pruning, small perturbations to the text embeddings of previously mitigated prompts can re-trigger data replication, revealing the fragility of such methods. Our further analysis then provides multiple indications that memorization is indeed not inherently local: (1) replication triggers for memorized images are distributed throughout text embedding space; (2) embeddings yielding the same replicated image produce divergent model activations; and (3) different pruning methods identify inconsistent sets of memorization-related weights for the same image. Finally, we show that bypassing the locality assumption enables more robust mitigation through adversarial fine-tuning. These findings provide new insights into the fundamental nature of memorization in text-to-image DMs and inform the future development of more reliable mitigation methods against DM memorization.
@inproceedings{kowalczuk2026finding,
title={Finding Do{RI}: Discovery of Retained Images in Diffusion Models},
author={Antoni Kowalczuk and Dominik Hintersdorf and Lukas Struppek and Kristian Kersting and Adam Dziedzic and Franziska Boenisch},
booktitle={Forty-third International Conference on Machine Learning},
year={2026}
}
-
Concept Removal for Frontier Image Generative Models
Aditya Kumar, Pierre Joly, Adam Dziedzic, Franziska Boenisch
Forty-third International Conference on Machine Learning
Image generative models are trained on massive, largely uncurated internet-scale datasets that contain undesirable visual concepts. Efficiently removing such concepts from the model generations without degrading the quality of output images remains challenging. We introduce a novel concept removal method for frontier diffusion and image autoregressive models, such as, SD3.5, Flux, and Infinity. Our intervention replaces the internal bottleneck layer present in all these modern models with a transcoder that is trained to replicate the original layer while structuring it into distinct activation features. This in‑place substitution creates an integrated filter through which concept‑specific signals can be selectively disabled while preserving the rest of the model's behavior. Since the intervention modifies the model backbone rather than attaching an external component, it remains persistent under white‑box access. Empirically, the approach achieves state‑of‑the‑art concept removal performance across modern diffusion and autoregressive models, maintains visual generation quality, provides robustness against adversarial prompts, and supports sequential removal of diverse concepts. This positions our method as a practical approach for concept removal in frontier image generative models.
@inproceedings{kumar2026concept,
title={Concept Removal for Frontier Image Generative Models},
author={Aditya Kumar and Pierre Joly and Adam Dziedzic and Franziska Boenisch},
booktitle={Forty-third International Conference on Machine Learning},
year={2026}
}
-
{MUC}: Machine Unlearning for Contrastive Learning with Black-box Evaluation
Yihan Wang, Yiwei Lu, Guojun Zhang, Franziska Boenisch, Adam Dziedzic, Yaoliang Yu, Xiao-Shan Gao
Machine unlearning offers effective solutions for revoking the influence of specific training data on pre-trained model parameters. While existing approaches address unlearning for classification and generative models, they overlook an important category of machine learning models: contrastive learning (CL) methods. This paper addresses this gap by introducing the Machine Unlearning for Contrastive Learning (MUC) framework and adapting existing methods. We identify limitations in current approaches, noting that several methods perform inadequately as unlearners and that existing evaluation tools insufficiently validate unlearning effects in contrastive learning. To address these issues, we propose Alignment Calibration (AC), a novel method that explicitly considers contrastive learning properties and optimizes towards new auditing metrics for easy verification of unlearning. Through empirical comparisons with baseline methods on SimCLR, MoCo, and CLIP, we demonstrate that AC: (1) achieves state-of-the-art performance, approximating exact unlearning (retraining); (2) enables data owners to clearly visualize unlearning effects through black-box evaluation. The code is available at https://github.com/EhanW/Alignment-Calibration.
@inproceedings{wang2025muc,
title={{MUC}: Machine Unlearning for Contrastive Learning with Black-box Evaluation},
author={Yihan Wang and Yiwei Lu and Guojun Zhang and Franziska Boenisch and Adam Dziedzic and Yaoliang Yu and Xiao-Shan Gao},
booktitle={},
year={2025}
}
-
Differentially Private Prototypes for Private Transfer Learning
Dariush Wahdany, Matthew Jagielski, Adam Dziedzic, Franziska Boenisch
The 39th Annual AAAI Conference on Artificial Intelligence
Machine learning (ML) models have been shown to leak private information from their training datasets. Differential Privacy (DP), typically implemented through the differential private stochastic gradient descent algorithm (DP-SGD), has become the standard solution to bound leakage from the models. Despite recent improvements, DP-SGD-based approaches for private learning still usually struggle in the high privacy (epsilon<1) and low data regimes, and when the private training datasets are imbalanced. To overcome these limitations, we propose Differentially Private Prototype Learning (DPPL) as a new paradigm for private transfer learning. DPPL leverages publicly pre-trained encoders to extract features from private data and generates DP prototypes that represent each private class in the embedding space and can be publicly released for inference. Since our DP prototypes can be obtained from only a few private training data points and without iterative noise addition, they offer high-utility predictions and strong privacy guarantees even under the notion of pure DP. We additionally show that privacy-utility trade-offs can be further improved when leveraging the public data beyond pre-training of the encoder: in particular, we can privately sample our DP prototypes from the publicly available data points used to train the encoder. Our experimental evaluation with four state-of-the-art encoders, four vision datasets, and under different data and imbalancedness regimes demonstrate DPPL's high performance under strong privacy guarantees in challenging private learning setups.
@inproceedings{wahdany2024dppl,
title={Differentially Private Prototypes for Private Transfer Learning},
author={Dariush Wahdany and Matthew Jagielski and Adam Dziedzic and Franziska Boenisch},
booktitle={The 39th Annual AAAI Conference on Artificial Intelligence},
year={2025}
}
-
Privacy Auditing for Large Language Models with Natural Identifiers
Lorenzo Rossi, Bart{\l}omiej Marek, Franziska Boenisch, Adam Dziedzic
ICLR 2025 Workshop on Navigating and Addressing Data Problems for Foundation Models
The privacy auditing for large language models (LLMs) faces significant challenges. Membership inference attacks, once considered a practical privacy auditing tool, are unreliable for pretrained LLMs due to the lack of non-member data from the same distribution as the member data. Exacerbating the situation further, the dataset inference cannot be performed without such a non-member set. Finally, we lack a formal post hoc auditing of training privacy guarantees. Previous differential privacy auditing methods are impractical since they rely on inserting specially crafted canary data during training, making audits on already pre-trained LLMs impossible without expensive retraining. This work introduces natural identifiers (NIDs) as a novel solution to these challenges. NIDs are structured random strings, such as SSH keys, cryptographic hashes, and shortened URLs, which naturally occur in common LLM training datasets. Their format enables the generation of unlimited additional random strings from the same distribution, which can act as non-members or alternative canaries for audit. Leveraging this property, we show how NIDs support robust evaluation of membership inference attacks, enable dataset inference for any suspect set containing NIDs, and facilitate post hoc privacy auditing without retraining.
@inproceedings{rossi2025privacy,
title={Privacy Auditing for Large Language Models with Natural Identifiers},
author={Lorenzo Rossi and Bart{\l}omiej Marek and Franziska Boenisch and Adam Dziedzic},
booktitle={ICLR 2025 Workshop on Navigating and Addressing Data Problems for Foundation Models},
year={2025}
}
-
Are Watermarks For Diffusion Models Radioactive?
Jan Dubi{\'n}ski, Michel Meintz, Franziska Boenisch, Adam Dziedzic
ICLR Workshop on GenAI Watermarking
As generative artificial intelligence (AI) models become increasingly widespread, ensuring transparency and provenance in AI-generated content has become a critical challenge. Watermarking techniques have been proposed to embed imperceptible yet detectable signals in AI-generated images, enabling provenance tracking and copyright enforcement. However, a second party can repurpose images generated by an existing model to train their own diffusion model, potentially disregarding the ownership rights of the original model creator. Recent research in language models has explored the concept of watermark \textit{radioactivity}, where embedded signals persist when training or fine-tuning a new model, enabling the detection of models trained on watermarked data. In this work, we investigate whether similar persistence occurs in diffusion models. Our findings reveal that none of the tested watermarking methods transfer their signal when used for fine-tuning a second model. This means that images generated by this new model exhibit detection results for the watermarks of the original model indistinguishable from random guessing. These results indicate that existing techniques are insufficient for ensuring watermark propagation through the model derivation chain and that novel approaches are needed to achieve effective and resilient watermark transfer in diffusion models.
@inproceedings{dubinski2025are,
title={Are Watermarks For Diffusion Models Radioactive?},
author={Jan Dubi{\'n}ski and Michel Meintz and Franziska Boenisch and Adam Dziedzic},
booktitle={ICLR Workshop on GenAI Watermarking},
year={2025}
}
-
Watermarking Image Autoregressive Models
Michel Meintz, Jan Dubi{\'n}ski, Franziska Boenisch, Adam Dziedzic
ICML Workshop on Data in Generative Models - The Bad, the Ugly, and the Greats
Image generative models have become increasingly popular, but training them requires large datasets that are costly to collect and curate. To circumvent these costs, some parties may exploit existing models by using the generated images as training data for their own models. In general, watermarking is a valuable tool for detecting unauthorized use of generated images. However, when these images are used to train a new model, watermarking can only enable detection if the watermark persists through training and remains identifiable in the outputs of the newly trained model - a property known as radioactivity. In this work, we are the first to propose a radioactive watermarking method tailored for IARs - drawing inspiration from techniques in large language models (LLMs), which share IARs' autoregressive paradigm. Our extensive experimental evaluation highlights our method's effectiveness in preserving radioactivity within IARs, enabling robust provenance tracking, and preventing unauthorized use of their generated images.
@inproceedings{meintz2025watermarking,
title={Watermarking Image Autoregressive Models},
author={Michel Meintz and Jan Dubi{\'n}ski and Franziska Boenisch and Adam Dziedzic},
booktitle={ICML Workshop on Data in Generative Models - The Bad, the Ugly, and the Greats},
year={2025}
}
-
Precise Parameter Localization for Textual Generation in Diffusion Models
Łukasz Staniszewski, Bartosz Cywi{\'n}ski, Franziska Boenisch, Kamil Deja, Adam Dziedzic
The Thirteenth International Conference on Learning Representations
Novel diffusion models (DMs) can synthesize photo-realistic images with integrated high-quality text. Surprisingly, we demonstrate through attention activation patching that only less than 1% of DMs' parameters contained in attention layers influence the generation of textual content within the images. Building on this observation, by precisely targeting cross and joint attention layers of DMs, we improve the efficiency and performance of textual generation. We introduce several applications that benefit from localizing the layers responsible for textual content generation. We first show that a LoRA-based fine-tuning solely of the localized layers enhances, even more, the general text-generation capabilities of large DMs while preserving the quality and diversity of the DMs' generations. Then, we demonstrate how we can use the localized layers to edit textual content in generated images. Finally, we extend this idea to the practical use case of preventing the generation of toxic text in a cost-free manner. In contrast to prior work, our localization approach is broadly applicable across various diffusion model architectures, including U-Net (e.g., LDM and SDXL) and transformer-based (e.g., DeepFloyd IF and Stable Diffusion 3), utilizing diverse text encoders (e.g., from CLIP and the large language models like T5).
@inproceedings{staniszewski2025precise,
title={Precise Parameter Localization for Textual Generation in Diffusion Models},
author={Łukasz Staniszewski and Bartosz Cywi{\'n}ski and Franziska Boenisch and Kamil Deja and Adam Dziedzic},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}
-
Captured by Captions: On Memorization and its Mitigation in {CLIP} Models
Wenhao Wang, Adam Dziedzic, Grace C. Kim, Michael Backes, Franziska Boenisch
The Thirteenth International Conference on Learning Representations
Multi-modal models, such as CLIP, have demonstrated strong performance in aligning visual and textual representations, excelling in tasks like image retrieval and zero-shot classification. Despite this success, the mechanisms by which these models utilize training data, particularly the role of memorization, remain unclear. In uni-modal models, both supervised and self-supervised, memorization has been shown to be essential for generalization. However, it is not well understood how these findings would apply to CLIP, which incorporates elements from both supervised learning via captions that provide a supervisory signal similar to labels, and from self-supervised learning via the contrastive objective. To bridge this gap in understanding, we propose a formal definition of memorization in CLIP (CLIPMem) and use it to quantify memorization in CLIP models. Our results indicate that CLIP's memorization behavior falls between the supervised and self-supervised paradigms, with "mis-captioned" samples exhibiting highest levels of memorization. Additionally, we find that the text encoder contributes more to memorization than the image encoder, suggesting that mitigation strategies should focus on the text domain. Building on these insights, we propose multiple strategies to reduce memorization while at the same time improving utility---something that had not been shown before for traditional learning paradigms where reducing memorization typically results in utility decrease.
@inproceedings{wang2025captured,
title={Captured by Captions: On Memorization and its Mitigation in {CLIP} Models},
author={Wenhao Wang and Adam Dziedzic and Grace C. Kim and Michael Backes and Franziska Boenisch},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}
-
Differentially Private Federated Learning with Time-Adaptive Privacy Spending
Shahrzad Kiani, Nupur Kulkarni, Adam Dziedzic, Stark Draper, Franziska Boenisch
The Thirteenth International Conference on Learning Representations (ICLR)
Federated learning (FL) with differential privacy (DP) provides a framework for collaborative machine learning, enabling clients to train a shared model while adhering to strict privacy constraints. The framework allows each client to have an individual privacy guarantee, e.g., by adding different amounts of noise to each client's model updates. One underlying assumption is that all clients spend their privacy budgets uniformly over time (learning rounds). However, it has been shown in the literature that learning in early rounds typically focuses on more coarse-grained features that can be learned at lower signal-to-noise ratios while later rounds learn fine-grained features that benefit from higher signal-to-noise ratios. Building on this intuition, we propose a time-adaptive DP-FL framework that expends the privacy budget non-uniformly across both time and clients. Our framework enables each client to save privacy budget in early rounds so as to be able to spend more in later rounds when additional accuracy is beneficial in learning more fine-grained features. We theoretically prove utility improvements in the case that clients with stricter privacy budgets spend budgets unevenly across rounds, compared to clients with more relaxed budgets, who have sufficient budgets to distribute their spend more evenly. Our practical experiments on standard benchmark datasets support our theoretical results and show that, in practice, our algorithms improve the privacy-utility trade-offs compared to baseline schemes.
@inproceedings{kiani2025differentially,
title={Differentially Private Federated Learning with Time-Adaptive Privacy Spending},
author={Shahrzad Kiani and Nupur Kulkarni and Adam Dziedzic and Stark Draper and Franziska Boenisch},
booktitle={The Thirteenth International Conference on Learning Representations (ICLR)},
year={2025}
}
-
CDI: Copyrighted Data Identification in Diffusion Models
Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic
The IEEE CVF Computer Vision and Pattern Recognition Conference (CVPR)
Diffusion Models (DMs) benefit from large and diverse datasets for their training. Since this data is often scraped from the Internet without permission from the data owners, this raises concerns about copyright and intellectual property protections. While (illicit) use of data is easily detected for training samples perfectly re-created by a DM at inference time, it is much harder for data owners to verify if their data was used for training when the outputs from the suspect DM are not close replicas. Conceptually, membership inference attacks (MIAs), which detect if a given data point was used during training, present themselves as a suitable tool to address this challenge. However, we demonstrate that existing MIAs are not strong enough to reliably determine the membership of individual images in large, state-of-the-art DMs. To overcome this limitation, we propose CDI, a framework for data owners to identify whether their dataset was used to train a given DM. CDI relies on dataset inference techniques, i.e., instead of using the membership signal from a single data point, CDI leverages the fact that most data owners, such as providers of stock photography, visual media companies, or even individual artists, own datasets with multiple publicly exposed data points which might all be included in the training of a given DM. By selectively aggregating signals from existing MIAs and using new handcrafted methods to extract features for these datasets, feeding them to a scoring model, and applying rigorous statistical testing, CDI allows data owners with as little as 70 data points to identify with a confidence of more than 99% whether their data was used to train a given DM. Thereby, CDI represents a valuable tool for data owners to claim illegitimate use of their copyrighted data.
@inproceedings{dubinski2024cdi,
title={CDI: Copyrighted Data Identification in Diffusion Models},
author={Jan Dubiński and Antoni Kowalczuk and Franziska Boenisch and Adam Dziedzic},
booktitle={The IEEE CVF Computer Vision and Pattern Recognition Conference (CVPR)},
year={2025}
}
-
Privacy Attacks on Image AutoRegressive Models
Antoni Kowalczuk, Jan Dubiński, Franziska Boenisch, Adam Dziedzic
Forty-Second International Conference on Machine Learning (ICML)
Image AutoRegressive generation has emerged as a new powerful paradigm with image autoregressive models (IARs) matching state-of-the-art diffusion models (DMs) in image quality (FID: 1.48 vs. 1.58) while allowing for a higher generation speed. However, the privacy risks associated with IARs remain unexplored, raising concerns regarding their responsible deployment. To address this gap, we conduct a comprehensive privacy analysis of IARs, comparing their privacy risks to the ones of DMs as reference points. Concretely, we develop a novel membership inference attack (MIA) that achieves a remarkably high success rate in detecting training images (with a True Positive Rate at False Positive Rate = 1% of 86.38% vs. 6.38% for DMs with comparable attacks). We leverage our novel MIA to provide dataset inference (DI) for IARs, and show that it requires as few as 6 samples to detect dataset membership (compared to 200 for DI in DMs), confirming a higher information leakage in IARs. Finally, we are able to extract hundreds of training data points from an IAR (e.g., 698 from VAR-d30). Our results suggest a fundamental privacy-utility trade-off: while IARs excel in image generation quality and speed, they are empirically significantly more vulnerable to privacy attacks compared to DMs that achieve similar performance.
@inproceedings{kowalczuk2025privacyIARs,
title={Privacy Attacks on Image AutoRegressive Models},
author={Antoni Kowalczuk and Jan Dubiński and Franziska Boenisch and Adam Dziedzic},
booktitle={Forty-Second International Conference on Machine Learning (ICML)},
year={2025}
}
-
Unlocking Post-hoc Dataset Inference with Synthetic Data
Bihe Zhao, Pratyush Maini, Franziska Boenisch, Adam Dziedzic
Forty-Second International Conference on Machine Learning (ICML)
The remarkable capabilities of Large Language Models (LLMs) can be mainly attributed to their massive training datasets, which are often scraped from the internet without respecting data owners' intellectual property rights. Dataset Inference (DI) offers a potential remedy by identifying whether a suspect dataset was used in training, thereby enabling data owners to verify unauthorized use. However, existing DI methods require a private set—known to be absent from training—that closely matches the compromised dataset's distribution. Such in-distribution, held-out data is rarely available in practice, severely limiting the applicability of DI. In this work, we address this challenge by synthetically generating the required held-out set. Our approach tackles two key obstacles: (1) creating high-quality synthetic data that accurately reflects the original distribution, which we achieve via a data generator trained on a carefully designed suffix completion task, and (2) bridging likelihood gaps between real and synthetic data, which is realized through post-hoc calibration. Extensive experiments on diverse text datasets show that our synthetic held-out set enables DI to detect the training sets with high confidence, while maintaining a low false positive rate. This result empowers copyright owners to make legitimate claims on data usage and demonstrates our method's reliability for real-world litigations.
@inproceedings{zhao2025posthocDI,
title={Unlocking Post-hoc Dataset Inference with Synthetic Data},
author={Bihe Zhao and Pratyush Maini and Franziska Boenisch and Adam Dziedzic},
booktitle={Forty-Second International Conference on Machine Learning (ICML)},
year={2025}
}
-
Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs
Xun Wang, Jing Xu, Franziska Boenisch, Michael Backes, Christopher A. Choquette-Choo, Adam Dziedzic
Forty-Second International Conference on Machine Learning (ICML)
Prompting has become a dominant paradigm for adapting large language models (LLMs). While discrete (textual) prompts are widely used for their interpretability, soft (parameter) prompts have recently gained traction in APIs. This is because they can encode information from more training samples while minimizing the user's token usage, leaving more space in the context window for task-specific input. However, soft prompts are tightly coupled to the LLM they are tuned on, limiting their generalization to other LLMs. This constraint is particularly problematic for efficiency and privacy: (1) tuning prompts on each LLM incurs high computational costs, especially as LLMs continue to grow in size. Additionally, (2) when the LLM is hosted externally, soft prompt tuning often requires sharing private data with the LLM provider. For instance, this is the case with the NVIDIA NeMo API. To address these issues, we propose POST (Privacy Of Soft prompt Transfer), a framework that enables private tuning of soft prompts on a small model and subsequently transfers these prompts to a larger LLM. POST uses knowledge distillation to derive a small model directly from the large LLM to improve prompt transferability, tunes the soft prompt locally, optionally with differential privacy guarantees, and transfers it back to the larger LLM using a small public dataset. Our experiments show that POST reduces computational costs, preserves privacy, and effectively transfers high-utility soft prompts.
@inproceedings{wang2025post,
title={Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs},
author={Xun Wang and Jing Xu and Franziska Boenisch and Michael Backes and Christopher A. Choquette-Choo and Adam Dziedzic},
booktitle={Forty-Second International Conference on Machine Learning (ICML)},
year={2025}
}
-
Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models
Jamie Hayes, Adam Dziedzic, A. Feder Cooper, Christopher A. Choquette-Choo, Franziska Boenisch, Georgios Kaissis, Igor Shilov, Ilia Shumailov, Katherine Lee, Matthew Jagielski, Matthieu Meeus, Meenatchi Sundaram Muthu Selva Annamalai, Niloofar Mireshghallah, Yves-Alexandre de Montjoye, Milad Nasr
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)
State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, weaker attacks have been shown to be brittle - achieving close-to-arbitrary success - and insights from strong attacks in simplified settings do not translate to today's LLMs. These challenges have prompted an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings; and, (3) the relationship between MIA success and related privacy metrics is not as straightforward as prior work has suggested.
@inproceedings{hayes2025strongMIALLMs,
title={Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models},
author={Jamie Hayes and Adam Dziedzic and A. Feder Cooper and Christopher A. Choquette-Choo and Franziska Boenisch and Georgios Kaissis and Igor Shilov and Ilia Shumailov and Katherine Lee and Matthew Jagielski and Matthieu Meeus and Meenatchi Sundaram Muthu Selva Annamalai and Niloofar Mireshghallah and Yves-Alexandre de Montjoye and Milad Nasr },
booktitle={The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)},
year={2025}
}
-
Memorization in Graph Neural Networks
Adarsh Jamadandi, Jing Xu, Adam Dziedzic, Franziska Boenisch
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)
Deep neural networks (DNNs) have been shown to memorize their training data, yet similar analyses for graph neural networks (GNNs) remain largely under-explored. We introduce NCMemo (Node Classification Memorization), the first framework to quantify label memorization in semi-supervised node classification. We first establish an inverse relationship between memorization and graph homophily, i.e the property that connected nodes share similar labels/features. We find that lower homophily significantly increases memorization, indicating that GNNs rely on memorization to learn less homophilic graphs. Secondly, we analyze GNN training dynamics. We find that the increased memorization in low homophily graphs is tightly coupled to the GNNs' implicit bias on using graph structure during learning. In low homophily regimes, this structure is less informative, hence inducing memorization of the node labels to minimize training loss. Finally, we show that nodes with higher label inconsistency in their feature-space neighborhood are significantly more prone to memorization. Building on our insights into the link between graph homophily and memorization, we investigate graph rewiring as a means to mitigate memorization. Our results demonstrate that this approach effectively reduces memorization without compromising model performance. Moreover, we show that it lowers the privacy risk for previously memorized data points in practice. Thus, our work not only advances understanding of GNN learning but also supports more privacy-preserving GNN deployment.
@inproceedings{Jamadandi2025memorizationGNNs,
title={Memorization in Graph Neural Networks},
author={Adarsh Jamadandi and Jing Xu and Adam Dziedzic and Franziska Boenisch},
booktitle={The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)},
year={2025}
}
-
BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models
Louis Kerner, Michel Meintz, Bihe Zhao, Franziska Boenisch, Adam Dziedzic
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)
State-of-the-art text-to-image models like Infinity generate photorealistic images at an unprecedented speed. These models operate in a bitwise autoregressive manner over a discrete set of tokens that is practically infinite in size. However, their impressive generative power comes with a growing risk: as their outputs increasingly populate the Internet, they are likely to be scraped and reused as training data-potentially by the very same models. This phenomenon has been shown to lead to model collapse, where repeated training on generated content, especially from the models' own previous versions, causes a gradual degradation in performance. A promising mitigation strategy is watermarking, which embeds human-imperceptible yet detectable signals into generated images-enabling the identification of generated content. In this work, we introduce BitMark, a robust bitwise watermarking framework for Infinity. Our method embeds a watermark directly at the bit level of the token stream across multiple scales (also referred to as resolutions) during Infinity's image generation process. Our bitwise watermark subtly influences the bits to preserve visual fidelity and generation speed while remaining robust against a spectrum of removal techniques. Furthermore, it exhibits high radioactivity, i.e., when watermarked generated images are used to train another image generative model, this second model's outputs will also carry the watermark. The radioactive traces remain detectable even when only fine-tuning diffusion or image autoregressive models on images watermarked with our BitMark. Overall, our approach provides a principled step toward preventing model collapse in image generative models by enabling reliable detection of generated outputs.
@inproceedings{kerner2025BitMark,
title={BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models},
author={Louis Kerner and Michel Meintz and Bihe Zhao and Franziska Boenisch and Adam Dziedzic},
booktitle={The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)},
year={2025}
}
-
Memorization in Self-Supervised Learning Improves Downstream Generalization
Wenhao Wang, Muhammad Ahmad Kaleem, Adam Dziedzic, Michael Backes, Nicolas Papernot, Franziska Boenisch
The Twelfth International Conference on Learning Representations (ICLR)
Self-supervised learning (SSL) has recently received significant attention due to its ability to train high-performance encoders purely on unlabeled data---often scraped from the internet. This data can still be sensitive and empirical evidence suggests that SSL encoders memorize private information of their training data and can disclose them at inference time. Since existing theoretical definitions of memorization from supervised learning rely on labels, they do not transfer to SSL. To address this gap, we propose SSLMem, a framework for defining memorization within SSL. Our definition compares the difference in alignment of representations for data points and their augmented views returned by both encoders that were trained on these data points and encoders that were not. Through comprehensive empirical analysis on diverse encoder architectures and datasets we highlight that even though SSL relies on large datasets and strong augmentations---both known in supervised learning as regularization techniques that reduce overfitting---still significant fractions of training data points experience high memorization. Through our empirical results, we show that this memorization is essential for encoders to achieve higher generalization performance on different downstream tasks.
@inproceedings{wang2024memorization,
title={Memorization in Self-Supervised Learning Improves Downstream Generalization},
author={Wenhao Wang and Muhammad Ahmad Kaleem and Adam Dziedzic and Michael Backes and Nicolas Papernot and Franziska Boenisch},
booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
year={2024}
}
-
Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data
Congyu Fang, Adam Dziedzic, Lin Zhang, Laura Oliva, Amol Verma, Fahad Razak, Nicolas Papernot, Bo Wang
eBioMedicine
Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucial to allow multiple parties to collaboratively train an ML model leveraging the private datasets available at each party without the need for direct sharing of those datasets or compromising the privacy of the datasets through collaboration. In this paper, we address this challenge by proposing Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH). It offers the following key benefits: (1) it allows different parties to collaboratively train an ML model without transferring their private datasets; (2) it safeguards patient privacy by limiting the potential privacy leakage arising from any contents shared across the parties during the training process; and (3) it facilitates the ML model training without relying on a centralized server. We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets: patient mortality prediction using electronic health records, cell-type classification using single-cell human genomes, and pathology identification using chest radiology images. We demonstrate that the ML models trained with DeCaPH framework have an improved utility-privacy trade-off, showing it enables the models to have good performance while preserving the privacy of the training data points. In addition, the ML models trained with DeCaPH framework in general outperform those trained solely with the private datasets from individual parties, showing that DeCaPH enhances the model generalizability.
@inproceedings{fang2024collaborative,
title={Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data},
author={Congyu Fang and Adam Dziedzic and Lin Zhang and Laura Oliva and Amol Verma and Fahad Razak and Nicolas Papernot and Bo Wang},
booktitle={eBioMedicine},
year={2024}
}
-
Alignment Calibration: Machine Unlearning for Contrastive Learning under Auditing
Yihan Wang, Yiwei Lu, Guojun Zhang, Franziska Boenisch, Adam Dziedzic, Yaoliang Yu, Xiao-Shan Gao
ICML 2024 Next Generation of AI Safety Workshop
Machine unlearning provides viable solutions to revoke the effect of certain training data on pre-trained model parameters. Existing approaches provide unlearning recipes for classification and generative models. However, a category of important machine learning models, i.e., contrastive learning (CL) methods, is overlooked. In this paper, we fill this gap by first proposing the framework of Machine Unlearning for Contrastive learning (MUC) and adapting existing methods. Furthermore, we observe that several methods are mediocre unlearners and existing auditing tools may not be sufficient for data owners to validate the unlearning effects in contrastive learning. We thus propose a novel method called Alignment Calibration (AC) by explicitly considering the properties of contrastive learning and optimizing towards novel auditing metrics to easily verify unlearning. We empirically compare AC with baseline methods on SimCLR, MoCo and CLIP. We observe that AC addresses drawbacks of existing methods: (1) achieving state-of-the-art performance and approximating exact unlearning (retraining); (2) allowing data owners to clearly visualize the effect caused by unlearning through black-box auditing.
@inproceedings{wang2024alignment,
title={Alignment Calibration: Machine Unlearning for Contrastive Learning under Auditing},
author={Yihan Wang and Yiwei Lu and Guojun Zhang and Franziska Boenisch and Adam Dziedzic and Yaoliang Yu and Xiao-Shan Gao},
booktitle={ICML 2024 Next Generation of AI Safety Workshop},
year={2024}
}
-
Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives
Vincent Hanke, Tom Blanchard, Franziska Boenisch, Iyiola Emmanuel Olatunji, Michael Backes, Adam Dziedzic
Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS)
While open Large Language Models (LLMs) have made significant progress, they still fall short of matching the performance of their closed, proprietary counterparts, making the latter attractive even for the use on highly private data. Recently, various new methods have been proposed to adapt closed LLMs to private data without leaking private information to third parties and/or the LLM provider. In this work, we analyze the privacy protection and performance of the four most recent methods for private adaptation of closed LLMs. By examining their threat models and thoroughly comparing their performance under different privacy levels according to differential privacy (DP), various LLM architectures, and multiple datasets for classification and generation tasks, we find that: (1) all the methods leak query data, i.e., the (potentially sensitive) user data that is queried at inference time, to the LLM provider, (2) three out of four methods also leak large fractions of private training data to the LLM provider while the method that protects private data requires a local open LLM, (3) all the methods exhibit lower performance compared to three private gradient-based adaptation methods for local open LLMs, and (4) the private adaptation methods for closed LLMs incur higher monetary training and query costs than running the alternative methods on local open LLMs. This yields the conclusion that, to achieve truly privacy-preserving LLM adaptations that yield high performance and more privacy at lower costs, taking into account current methods and models, one should use open LLMs.
@inproceedings{hanke2024openLLMs,
title={Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives},
author={Vincent Hanke and Tom Blanchard and Franziska Boenisch and Iyiola Emmanuel Olatunji and Michael Backes and Adam Dziedzic},
booktitle={Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS)},
year={2024}
}
-
LLM Dataset Inference: Did you train on my dataset?
Pratyush Maini, Hengrui Jia, Nicolas Papernot, Adam Dziedzic
Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS)
The proliferation of large language models (LLMs) in the real world has come with a rise in copyright cases against companies for training their models on unlicensed data from the internet. Recent works have presented methods to identify if individual text sequences were members of the model's training data, known as membership inference attacks (MIAs). We demonstrate that the apparent success of these MIAs is confounded by selecting non-members (text sequences not used for training) belonging to a different distribution from the members (e.g., temporally shifted recent Wikipedia articles compared with ones used to train the model). This distribution shift makes membership inference appear successful. However, most MIA methods perform no better than random guessing when discriminating between members and non-members from the same distribution (e.g., in this case, the same period of time). Even when MIAs work, we find that different MIAs succeed at inferring membership of samples from different distributions. Instead, we propose a new dataset inference method to accurately identify the datasets used to train large language models. This paradigm sits realistically in the modern-day copyright landscape, where authors claim that an LLM is trained over multiple documents (such as a book) written by them, rather than one particular paragraph. While dataset inference shares many of the challenges of membership inference, we solve it by selectively combining the MIAs that provide positive signal for a given distribution, and aggregating them to perform a statistical test on a given dataset. Our approach successfully distinguishes the train and test sets of different subsets of the Pile with statistically significant p-values < 0.1, without any false positives.
@inproceedings{maini2024LLMDatasetInference,
title={LLM Dataset Inference: Did you train on my dataset?},
author={Pratyush Maini and Hengrui Jia and Nicolas Papernot and Adam Dziedzic},
booktitle={Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS)},
year={2024}
}
-
Localizing Memorization in SSL Vision Encoders
Wenhao Wang, Adam Dziedzic, Michael Backes, Franziska Boenisch
Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS)
Recent work on studying memorization in self-supervised learning (SSL) suggests that even though SSL encoders are trained on millions of images, they still memorize individual data points. While effort has been put into characterizing the memorized data and linking encoder memorization to downstream utility, little is known about where the memorization happens inside SSL encoders. To close this gap, we propose two metrics for localizing memorization in SSL encoders on a per-layer (layermem) and per-unit basis (unitmem). Our localization methods are independent of the downstream task, do not require any label information, and can be performed in a forward pass. By localizing memorization in various encoder architectures (convolutional and transformer-based) trained on diverse datasets with contrastive and non-contrastive SSL frameworks, we find that (1) while SSL memorization increases with layer depth, highly memorizing units are distributed across the entire encoder, (2) a significant fraction of units in SSL encoders experiences surprisingly high memorization of individual data points, which is in contrast to models trained under supervision, (3) atypical (or outlier) data points cause much higher layer and unit memorization than standard data points, and (4) in vision transformers, most memorization happens in the fully-connected layers. Finally, we show that localizing memorization in SSL has the potential to improve fine-tuning and to inform pruning strategies.
@inproceedings{wang2024LocalizeMemorizationSSL,
title={Localizing Memorization in SSL Vision Encoders},
author={Wenhao Wang and Adam Dziedzic and Michael Backes and Franziska Boenisch},
booktitle={Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS)},
year={2024}
}
-
Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models
Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch
Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS)
Diffusion models (DMs) produce very detailed and high-quality images. Their power results from extensive training on large amounts of data, usually scraped from the internet without proper attribution or consent from content creators. Unfortunately, this practice raises privacy and intellectual property concerns, as DMs can memorize and later reproduce their potentially sensitive or copyrighted training images at inference time. Prior efforts prevent this issue by either changing the input to the diffusion process, thereby preventing the DM from generating memorized samples during inference, or removing the memorized data from training altogether. While those are viable solutions when the DM is developed and deployed in a secure and constantly monitored environment, they hold the risk of adversaries circumventing the safeguards and are not effective when the DM itself is publicly released. To solve the problem, we introduce NeMo, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers. Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples. By deactivating these memorization neurons, we can avoid the replication of training data at inference time, increase the diversity in the generated outputs, and mitigate the leakage of private and copyrighted data. In this way, our NeMo contributes to a more responsible deployment of DMs.
@inproceedings{hintersdorf2024MemorizationDiffusionModels,
title={Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models},
author={Dominik Hintersdorf and Lukas Struppek and Kristian Kersting and Adam Dziedzic and Franziska Boenisch},
booktitle={Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS)},
year={2024}
}
-
Individualized PATE: Differentially Private Machine Learning with Individual Privacy Guarantees
Franziska Boenisch, Christopher Mühl, Roy Rinberg, Jannis Ihrig, Adam Dziedzic
Privacy Enhancing Technologies Symposium (PETS)
Applying machine learning (ML) to sensitive domains requires privacy protection of the underlying training data through formal privacy frameworks, such as differential privacy (DP). Yet, usually, the privacy of the training data comes at the cost of the resulting ML models' utility. One reason for this is that DP uses one uniform privacy budget epsilon for all training data points, which has to align with the strictest privacy requirement encountered among all data holders. In practice, different data holders have different privacy requirements and data points of data holders with lower requirements can contribute more information to the training process of the ML models. To account for this need, we propose two novel methods based on the Private Aggregation of Teacher Ensembles (PATE) framework to support the training of ML models with individualized privacy guarantees. We formally describe the methods, provide a theoretical analysis of their privacy bounds, and experimentally evaluate their effect on the final model's utility using the MNIST, SVHN, and Adult income datasets. Our empirical results show that the individualized privacy methods yield ML models of higher accuracy than the non-individualized baseline. Thereby, we improve the privacy-utility trade-off in scenarios in which different data holders consent to contribute their sensitive data at different individual privacy levels.
@inproceedings{pate2023pets,
title={Individualized PATE: Differentially Private Machine Learning with Individual Privacy Guarantees},
author={Franziska Boenisch and Christopher Mühl and Roy Rinberg and Jannis Ihrig and Adam Dziedzic},
booktitle={Privacy Enhancing Technologies Symposium (PETS)},
year={2023}
}
-
Private Multi-Winner Voting for Machine Learning
Adam Dziedzic, Christopher A Choquette-Choo, Natalie Dullerud, Vinith Menon Suriyakumar, Ali Shahin Shamsabadi, Muhammad Ahmad Kaleem, Somesh Jha, Nicolas Papernot, Xiao Wang
Privacy Enhancing Technologies Symposium (PETS)
Private multi-winner voting is the task of revealing k-hot binary vectors satisfying a bounded differential privacy (DP) guarantee. This task has been understudied in machine learning literature despite its prevalence in many domains such as healthcare. We propose three new DP multi-winner mechanisms: Binary, τ, and Powerset voting. Binary voting operates independently per label through composition. τ voting bounds votes optimally in their ℓ2 norm for tight data-independent guarantees. Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set. Our theoretical and empirical analysis shows that Binary voting can be a competitive mechanism on many tasks unless there are strong correlations between labels, in which case Powerset voting outperforms it. We use our mechanisms to enable privacy-preserving multi-label learning in the central setting by extending the canonical single-label technique: PATE. We find that our techniques outperform current state-of-the-art approaches on large, real-world healthcare data and standard multi-label benchmarks. We further enable multi-label confidential and private collaborative (CaPC) learning and show that model performance can be significantly improved in the multi-site setting.
@inproceedings{multilabel2023pets,
title={Private Multi-Winner Voting for Machine Learning},
author={Adam Dziedzic and Christopher A Choquette-Choo and Natalie Dullerud and Vinith Menon Suriyakumar and Ali Shahin Shamsabadi and Muhammad Ahmad Kaleem and Somesh Jha and Nicolas Papernot and Xiao Wang},
booktitle={Privacy Enhancing Technologies Symposium (PETS)},
year={2023}
}
-
Bucks for Buckets (B4B): Active Defenses Against Stealing Encoders
Dubiński, Jan, Pawlak, Stanisław, Boenisch, Franziska, Trzcinski, Tomasz, Dziedzic, Adam
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS)
Machine Learning as a Service (MLaaS) APIs provide ready-to-use and high-utility encoders that generate vector representations for given inputs. Since these encoders are very costly to train, they become lucrative targets for model stealing attacks during which an adversary leverages query access to the API to replicate the encoder locally at a fraction of the original training costs. We propose Bucks for Buckets (B4B), the first active defense that prevents stealing while the attack is happening without degrading representation quality for legitimate API users. Our defense relies on the observation that the representations returned to adversaries who try to steal the encoder's functionality cover a significantly larger fraction of the embedding space than representations of legitimate users who utilize the encoder to solve a particular downstream task. B4B leverages this to adaptively adjust the utility of the returned representations according to a user's coverage of the embedding space. To prevent adaptive adversaries from eluding our defense by simply creating multiple user accounts (sybils), B4B also individually transforms each user's representations. This prevents the adversary from directly aggregating representations over multiple accounts to create their stolen encoder copy. Our active defense opens a new path towards securely sharing and democratizing encoders over public APIs.
@inproceedings{dubinski2023bucks,
title={Bucks for Buckets (B4B): Active Defenses Against Stealing Encoders},
author={Dubiński, Jan and Pawlak, Stanisław and Boenisch, Franziska and Trzcinski, Tomasz and Dziedzic, Adam},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS)},
year={2023}
}
-
Robust and Actively Secure Serverless Collaborative Learning
Franzese, Nicholas, Dziedzic, Adam, Choquette-Choo, Christopher A., Thomas, Mark R., Kaleem, Muhammad Ahmad, Rabanser, Stephan, Fang, Congyu, Jha, Somesh, Papernot, Nicolas, Wang, Xiao
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS)
Collaborative machine learning (ML) is widely used to enable institutions to learn better models from distributed data. While collaborative approaches to learning intuitively protect user data, they remain vulnerable to either the server, the clients, or both, deviating from the protocol. Indeed, because the protocol is asymmetric, a malicious server can abuse its power to reconstruct client data points. Conversely, malicious clients can corrupt learning with malicious updates. Thus, both clients and servers require a guarantee when the other cannot be trusted to fully cooperate. In this work, we propose a peer-to-peer (P2P) learning scheme that is secure against malicious servers and robust to malicious clients. Our core contribution is a generic framework that transforms any (compatible) algorithm for robust aggregation of model updates to the setting where servers and clients can act maliciously. Finally, we demonstrate the computational efficiency of our approach even with 1-million parameter models trained by 100s of peers on standard datasets.
@inproceedings{franzeses2023p2pml,
title={Robust and Actively Secure Serverless Collaborative Learning},
author={Franzese, Nicholas and Dziedzic, Adam and Choquette-Choo, Christopher A. and Thomas, Mark R. and Kaleem, Muhammad Ahmad and Rabanser, Stephan and Fang, Congyu and Jha, Somesh and Papernot, Nicolas and Wang, Xiao},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS)},
year={2023}
}
-
Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
Duan, Haonan,, Dziedzic, Adam, Papernot, Nicolas, Boenisch, Franziska
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS)
Large language models (LLMs) are excellent in-context learners. However, the sensitivity of data contained in prompts raises privacy concerns. Our work first shows that these concerns are valid: we instantiate a simple but highly effective membership inference attack against the data used to prompt LLMs. To address this vulnerability, one could forego prompting and resort to fine-tuning LLMs with known algorithms for private gradient descent. However, this comes at the expense of the practicality and efficiency offered by prompting. Therefore, we propose to privately learn to prompt. We first show that soft prompts can be obtained privately through gradient descent on downstream data. However, this is not the case for discrete prompts. Thus, we orchestrate a noisy vote among an ensemble of LLMs presented with different prompts, i.e., a flock of stochastic parrots. The vote privately transfers the flock's knowledge into a single public prompt. We show that LLMs prompted with our private algorithms closely match the non-private baselines. For example, using GPT3 as the base model, we achieve a downstream accuracy of 92.7\% on the sst2 dataset with strong differential privacy guarantees vs. 95.2\% for the non-private baseline. Through our experiments, we also show that our prompt-based approach is easily deployed with existing commercial~APIs.
@inproceedings{duan2023flocks,
title={Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models},
author={Duan, Haonan, and Dziedzic, Adam and Papernot, Nicolas and Boenisch, Franziska},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS)},
year={2023}
}
-
Have it your way: Individualized Privacy Assignment for DP-SGD
Franziska Boenisch, Christopher Mühl, Dziedzic, Adam, Roy Rinberg, Nicolas Papernot
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS)
When training a machine learning model with differential privacy, one sets a privacy budget. This budget represents a maximal privacy violation that any user is willing to face by contributing their data to the training set. We argue that this approach is limited because different users may have different privacy expectations. Thus, setting a uniform privacy budget across all points may be overly conservative for some users or, conversely, not sufficiently protective for others. In this paper, we capture these preferences through individualized privacy budgets. To demonstrate their practicality, we introduce a variant of Differentially Private Stochastic Gradient Descent (DP-SGD) which supports such individualized budgets. DP-SGD is the canonical approach to training models with differential privacy. We modify its data sampling and gradient noising mechanisms to arrive at our approach, which we call Individualized DP-SGD (IDP-SGD). Because IDP-SGD provides privacy guarantees tailored to the preferences of individual users and their data points, we find it empirically improves privacy-utility trade-offs.
@inproceedings{boenisch2023idpsgd,
title={Have it your way: Individualized Privacy Assignment for DP-SGD},
author={Franziska Boenisch and Christopher Mühl and Dziedzic, Adam and Roy Rinberg and Nicolas Papernot},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS)},
year={2023}
}
-
On the privacy risk of in-context learning
Haonan Duan, Adam Dziedzic, Mohammad Yaghini, Nicolas Papernot, Franziska Boenisch
The 61st Annual Meeting Of The Association For Computational Linguistics
Large language models (LLMs) are excellent few-shot learners. They can perform a wide variety of tasks purely based on natural language prompts provided to them. These prompts contain data of a specific downstream task—often the private dataset of a party, e.g., a company that wants to leverage the LLM on their purposes. We show that deploying prompted models presents a significant privacy risk for the data used within the prompt by instantiating a highly effective membership inference attack. We also observe that the privacy risk of prompted models exceeds fine-tuned models at the same utility levels. After identifying the model's sensitivity to their prompts—in form of a significantly higher prediction confidence on the prompted data—as a cause for the increased risk, we propose ensembling as a mitigation strategy. By aggregating over multiple different versions of a prompted model, membership inference risk can be decreased.
@inproceedings{duan2023privacyICL,
title={On the privacy risk of in-context learning},
author={Haonan Duan and Adam Dziedzic and Mohammad Yaghini and Nicolas Papernot and Franziska Boenisch},
booktitle={The 61st Annual Meeting Of The Association For Computational Linguistics},
year={2023}
}
-
CaPC Learning: Confidential and Private Collaborative Learning
Christopher A. Choquette-Choo, Natalie Dullerud, Adam Dziedzic, Yunxiang Zhang, Somesh Jha, Nicolas Papernot, Xiao Wang
ICLR (International Conference on Learning Representations)
Machine learning benefits from large training datasets, which may not always be possible to collect by any single entity, especially when using privacy-sensitive data. In many contexts, such as healthcare and finance, separate parties may wish to collaborate and learn from each other's data but are prevented from doing so due to privacy regulations. Some regulations prevent explicit sharing of data between parties by joining datasets in a central location (confidentiality). Others also limit implicit sharing of data, e.g., through model predictions (privacy). There is currently no method that enables machine learning in such a setting, where both confidentiality and privacy need to be preserved, to prevent both explicit and implicit sharing of data. Federated learning only provides confidentiality, not privacy, since gradients shared still contain private information. Differentially private learning assumes unreasonably large datasets. Furthermore, both of these learning paradigms produce a central model whose architecture was previously agreed upon by all parties rather than enabling collaborative learning where each party learns and improves their own local model. We introduce Confidential and Private Collaborative (CaPC) learning, the first method provably achieving both confidentiality and privacy in a collaborative setting. We leverage secure multi-party computation (MPC), homomorphic encryption (HE), and other techniques in combination with privately aggregated teacher models. We demonstrate how CaPC allows participants to collaborate without having to explicitly join their training sets or train a central model. Each party is able to improve the accuracy and fairness of their model, even in settings where each party has a model that performs well on their own dataset or when datasets are not IID and model architectures are heterogeneous across parties.
@inproceedings{capc2021iclr,
title={CaPC Learning: Confidential and Private Collaborative Learning},
author={Christopher A. Choquette-Choo and Natalie Dullerud and Adam Dziedzic and Yunxiang Zhang and Somesh Jha and Nicolas Papernot and Xiao Wang},
booktitle={ICLR (International Conference on Learning Representations)},
year={2021}
}
-
Preoperative paraspinal neck muscle characteristics predict early onset adjacent segment degeneration in anterior cervical fusion patients: A machine-learning modeling analysis
Wong, Arnold Y. L., Harada, Garrett, Lee, Remy, Gandhi, Sapan D., Dziedzic, Adam, Espinoza-Orias, Alejandro, Parnianpour, Mohamad, Louie, Philip K., Basques, Bryce, An, Howard S., Samartzis, Dino
Abstract Early onset adjacent segment degeneration (ASD) can be found within six months after anterior cervical discectomy and fusion (ACDF). Deficits in deep paraspinal neck muscles may be related to early onset ASD. This study aimed to determine whether the morphometry of preoperative deep neck muscles (multifidus and semispinalis cervicis) predicted early onset ASD in patients with ACDF. Thirty-two cases of early onset ASD after a two-level ACDF and 30 matched non-ASD cases were identified from a large-scale cohort. The preoperative total cross-sectional area (CSA) of bilateral deep neck muscles and the lean muscle CSAs from C3 to C7 levels were measured manually on T2-weighted magnetic resonance imaging. Paraspinal muscle CSA asymmetry at each level was calculated. A support vector machine (SVM) algorithm was used to identify demographic, radiographic, and/or muscle parameters that predicted proximal/distal ASD development. No significant between-group differences in demographic or preoperative radiographic data were noted (mean age: 52.4 ± 10.9 years). ACDFs comprised C3 to C5 (n = 9), C4 to C6 (n = 20), and C5 to C7 (n = 32) cases. Eighteen, eight, and six patients had proximal, distal, or both ASD, respectively. The SVM model achieved high accuracy (96.7\%) and an area under the curve (AUC = 0.97) for predicting early onset ASD. Asymmetry of fat at C5 (coefficient: 0.06), and standardized measures of C7 lean (coefficient: 0.05) and total CSA measures (coefficient: 0.05) were the strongest predictors of early onset ASD. This is the first study to show that preoperative deep neck muscle CSA, composition, and asymmetry at C5 to C7 independently predicted postoperative early onset ASD in patients with ACDF. Paraspinal muscle assessments are recommended to identify high-risk patients for personalized intervention.
@inproceedings{wong2021ML,
title={Preoperative paraspinal neck muscle characteristics predict early onset adjacent segment degeneration in anterior cervical fusion patients: A machine-learning modeling analysis},
author={Wong, Arnold Y. L. and Harada, Garrett and Lee, Remy and Gandhi, Sapan D. and Dziedzic, Adam and Espinoza-Orias, Alejandro and Parnianpour, Mohamad and Louie, Philip K. and Basques, Bryce and An, Howard S. and Samartzis, Dino},
booktitle={},
year={2021}
}
-
On the Exploitability of Audio Machine Learning Pipelines to Surreptitious Adversarial Examples
Adelin Travers, Lorna Licollari, Guanghan Wang, Varun Chandrasekaran, Adam Dziedzic, David Lie, Nicolas Papernot
Machine learning (ML) models are known to be vulnerable to adversarial examples. Applications of ML to voice biometrics authentication are no exception. Yet, the implications of audio adversarial examples on these real-world systems remain poorly understood given that most research targets limited defenders who can only listen to the audio samples. Conflating detectability of an attack with human perceptibility, research has focused on methods that aim to produce imperceptible adversarial examples which humans cannot distinguish from the corresponding benign samples. We argue that this perspective is coarse for two reasons: 1. Imperceptibility is impossible to verify; it would require an experimental process that encompasses variations in listener training, equipment, volume, ear sensitivity, types of background noise etc, and 2. It disregards pipeline-based detection clues that realistic defenders leverage. This results in adversarial examples that are ineffective in the presence of knowledgeable defenders. Thus, an adversary only needs an audio sample to be plausible to a human. We thus introduce surreptitious adversarial examples, a new class of attacks that evades both human and pipeline controls. In the white-box setting, we instantiate this class with a joint, multi-stage optimization attack. Using an Amazon Mechanical Turk user study, we show that this attack produces audio samples that are more surreptitious than previous attacks that aim solely for imperceptibility. Lastly we show that surreptitious adversarial examples are challenging to develop in the black-box setting.
@inproceedings{travers2021exploitability,
title={On the Exploitability of Audio Machine Learning Pipelines to Surreptitious Adversarial Examples},
author={Adelin Travers and Lorna Licollari and Guanghan Wang and Varun Chandrasekaran and Adam Dziedzic and David Lie and Nicolas Papernot},
booktitle={},
year={2021}
}
-
When the Curious Abandon Honesty: Federated Learning Is Not Private
Franziska Boenisch, Adam Dziedzic, Roei Schuster, Ali Shahin Shamsabadi, Ilia Shumailov, Nicolas Papernot
In federated learning (FL), data does not leave personal devices when they are jointly training a machine learning model. Instead, these devices share gradients with a central party (e.g., a company). Because data never "leaves" personal devices, FL is presented as privacy-preserving. Yet, recently it was shown that this protection is but a thin facade, as even a passive attacker observing gradients can reconstruct data of individual users. In this paper, we argue that prior work still largely underestimates the vulnerability of FL. This is because prior efforts exclusively consider passive attackers that are honest-but-curious. Instead, we introduce an active and dishonest attacker acting as the central party, who is able to modify the shared model's weights before users compute model gradients. We call the modified weights "trap weights". Our active attacker is able to recover user data perfectly and at near zero costs: the attack requires no complex optimization objectives. Instead, it exploits inherent data leakage from model gradients and amplifies this effect by maliciously altering the weights of the shared model. These specificities enable our attack to scale to models trained with large mini-batches of data. Where attackers from prior work require hours to recover a single data point, our method needs milliseconds to capture the full mini-batch of data from both fully-connected and convolutional deep neural networks. Finally, we consider mitigations. We observe that current implementations of differential privacy (DP) in FL are flawed, as they explicitly trust the central party with the crucial task of adding DP noise, and thus provide no protection against a malicious central party. We also consider other defenses and explain why they are similarly inadequate. A significant redesign of FL is required for it to provide any meaningful form of data privacy to users.
@inproceedings{boenisch2021curious,
title={When the Curious Abandon Honesty: Federated Learning Is Not Private},
author={Franziska Boenisch and Adam Dziedzic and Roei Schuster and Ali Shahin Shamsabadi and Ilia Shumailov and Nicolas Papernot},
booktitle={},
year={2021}
}
-
Private AI Collaborative Research Institute: Vision, Challenges, and Opportunities
Ahmad-Reza Sadeghi, Ferdinand Brasser, Markus Miettinen, Thien Duc Nguyen, Thomas Given-Wilson, Axel Legay, Murali Annaaram, Salman Avestimeh, Alexandra Dmitrienko, Farinaz Koushanfar, Buse Gul Atli, Florian Kerschbaum, Lachlan J. Gunn, N. Asokan, Matthias Schunter, Rosario Cammarota, Adam Dziedzic, Nicolas Papernot, Virginia Smith, Reza Shokri
This document outlines the research vision of the collaborative research center for Privacy-preserving Machine Learning. While federated machine learning starts to be deployed, its security and privacy implications are not well understood today. Our goal is to conduct research enabling the future of decentralized machine learning: Underpinning Federated ML with robust privacy guarantees and efficient algoritms to achieve those guarantees. Exploring knowledge transfer and collaborative ML beyond Federated machine learning that suffers from a central controller as its root of trust. Exploring Graph Neural Networks and their privacy implications. Ensuring robustness against malicious participants that may steal models or may try to poison maching learning models during training. This research will then be validated in case studies and deployed in open source frameworks to allow further experimentation and deployment on a wider scale.
@inproceedings{IntelPrivateAIVision2021,
title={Private AI Collaborative Research Institute: Vision, Challenges, and Opportunities},
author={Ahmad-Reza Sadeghi and Ferdinand Brasser and Markus Miettinen and Thien Duc Nguyen and Thomas Given-Wilson and Axel Legay and Murali Annaaram and Salman Avestimeh and Alexandra Dmitrienko and Farinaz Koushanfar and Buse Gul Atli and Florian Kerschbaum and Lachlan J. Gunn and N. Asokan and Matthias Schunter and Rosario Cammarota and Adam Dziedzic and Nicolas Papernot and Virginia Smith and Reza Shokri},
booktitle={},
year={2021}
}
-
Private Multi-Winner Voting For Machine Learning
Adam Dziedzic, Christopher A Choquette-Choo, Natalie Dullerud, Vinith Menon Suriyakumar, Ali Shahin Shamsabadi, Muhammad Ahmad Kaleem, Somesh Jha, Nicolas Papernot, Xiao Wang
Private multi-winner voting is the task of revealing k-hot binary vectors that satisfy a bounded differential privacy guarantee. This task has been understudied in the machine learning literature despite its prevalence in many domains such as healthcare. We propose three new privacy-preserving multi-label mechanisms: Binary, and Powerset voting. Binary voting operates independently per label through composition. voting bounds votes optimally in their norm. Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set. We theoretically analyze tradeoffs showing that Powerset voting requires strong correlations between labels to outperform Binary voting. We use these mechanisms to enable privacy-preserving multi-label learning by extending the canonical single-label technique: PATE. We empirically compare our techniques with DPSGD on large real-world healthcare data and standard multi-label benchmarks. We find that our techniques outperform all others in the centralized setting. We enable multi-label CaPC and show that our mechanisms can be used to collaboratively improve models in a multi-site (distributed) setting.
@inproceedings{PrivateMultiWinnerVoting2021,
title={Private Multi-Winner Voting For Machine Learning},
author={Adam Dziedzic and Christopher A Choquette-Choo and Natalie Dullerud and Vinith Menon Suriyakumar and Ali Shahin Shamsabadi and Muhammad Ahmad Kaleem and Somesh Jha and Nicolas Papernot and Xiao Wang},
booktitle={},
year={2021}
}
-
Pretrained Transformers Improve Out-of-Distribution Robustness
Hendrycks, Dan and
Liu, Xiaoyuan and
Wallace, Eric and
Dziedzic, Adam and
Krishnan, Rishabh and
Song, Dawn
ACL (Association for Computational Linguistics)
Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions? We systematically measure out-of-distribution (OOD) generalization for seven NLP datasets by constructing a new robustness benchmark with realistic distribution shifts. We measure the generalization of previous models including bag-of-words models, ConvNets, and LSTMs, and we show that pretrained Transformers{'} performance declines are substantially smaller. Pretrained transformers are also more effective at detecting anomalous or OOD examples, while many previous models are frequently worse than chance. We examine which factors affect robustness, finding that larger models are not necessarily more robust, distillation can be harmful, and more diverse pretraining data can enhance robustness. Finally, we show where future work can improve OOD robustness.
@inproceedings{hendrycks-etal-2020-pretrained,
title={Pretrained Transformers Improve Out-of-Distribution Robustness},
author={Hendrycks, Dan and
Liu, Xiaoyuan and
Wallace, Eric and
Dziedzic, Adam and
Krishnan, Rishabh and
Song, Dawn},
booktitle={ ACL (Association for Computational Linguistics)},
year={2020}
}
-
Machine Learning based detection of multiple Wi-Fi BSSs for LTE-U CSAT
Sathya, Vanlin, Dziedzic, Adam, Ghosh, Monisha, Krishnan, Sanjay
ICNC (International Conference on Computing, Networking and Communications)
According to the LTE-U Forum specification, a LTE-U base-station (BS) reduces its duty cycle from 50% to 33% when it senses an increase in the number of co-channel Wi-Fi basic service sets (BSSs) from one to two. The detection of the number of Wi-Fi BSSs that are operating on the channel in real-time, without decoding the Wi-Fi packets, still remains a challenge. In this paper, we present a novel machine learning (ML) approach that solves the problem by using energy values observed during LTE-U OFF duration. Observing the energy values (at LTE-U BS OFF time) is a much simpler operation than decoding the entire Wi-Fi packets. In this work, we implement and validate the proposed ML based approach in real-time experiments, and demonstrate that there are two distinct patterns between one and two Wi-Fi APs. This approach delivers an accuracy close to 100% compared to auto-correlation (AC) and energy detection (ED) approaches.
@inproceedings{sathya2020machine,
title={Machine Learning based detection of multiple Wi-Fi BSSs for LTE-U CSAT},
author={Sathya, Vanlin and Dziedzic, Adam and Ghosh, Monisha and Krishnan, Sanjay},
booktitle={ICNC (International Conference on Computing, Networking and Communications)},
year={2020}
}
-
An Empirical Evaluation of Perturbation-based Defenses
Dziedzic, Adam, Krishnan, Sanjay
Recent work has extensively shown that randomized perturbations of a neural network can improve its robustness to adversarial attacks. The literature is, however, lacking a detailed compare-and-contrast of the latest proposals to understand what classes of perturbations work, when they work, and why they work. We contribute a detailed experimental evaluation that elucidates these questions and benchmarks perturbation defenses in a consistent way. In particular, we show five main results: (1) all input perturbation defenses, whether random or deterministic, are essentially equivalent in their efficacy, (2) such defenses offer almost no robustness to adaptive attacks unless these perturbations are observed during training, (3) a tuned sequence of noise layers across a network provides the best empirical robustness, (4) attacks transfer between perturbation defenses so the attackers need not know the specific type of defense only that it involves perturbations, and (5) adversarial examples very close to original images show an elevated sensitivity to perturbation in a first-order analysis. Based on these insights, we demonstrate a new robust model built on noise injection and adversarial training that achieves state-of-the-art robustness.
@inproceedings{dziedzic2020empirical,
title={An Empirical Evaluation of Perturbation-based Defenses},
author={Dziedzic, Adam and Krishnan, Sanjay},
booktitle={},
year={2020}
}
-
Machine Learning enabled Spectrum Sharing in Dense LTE-U/Wi-Fi Coexistence Scenarios
Dziedzic, Adam, Sathya, Vanlin, Rochman, Muhammad, Ghosh, Monisha, Krishnan, Sanjay
The application of Machine Learning (ML) techniques to complex engineering problems has proved to be an attractive and efficient solution. ML has been successfully applied to several practical tasks like image recognition, automating industrial operations, etc. The promise of ML techniques in solving non-linear problems influenced this work which aims to apply known ML techniques and develop new ones for wireless spectrum sharing between Wi-Fi and LTE in the unlicensed spectrum. In this work, we focus on the LTE-Unlicensed (LTE-U) specification developed by the LTE-U Forum, which uses the duty-cycle approach for fair coexistence. The specification suggests reducing the duty cycle at the LTE-U base-station (BS) when the number of co-channel Wi-Fi basic service sets (BSSs) increases from one to two or more. However, without decoding the Wi-Fi packets, detecting the number of Wi-Fi BSSs operating on the channel in real-time is a challenging problem. In this work, we demonstrate a novel ML-based approach which solves this problem by using energy values observed during the LTE-U OFF duration. It is relatively straightforward to observe only the energy values during the LTE-U BS OFF time compared to decoding the entire Wi-Fi packet, which would require a full Wi-Fi receiver at the LTE-U base-station. We implement and validate the proposed ML-based approach by real-time experiments and demonstrate that there exist distinct patterns between the energy distributions between one and many Wi-Fi AP transmissions. The proposed ML-based approach results in a higher accuracy (close to 99% in all cases) as compared to the existing auto-correlation (AC) and energy detection (ED) approaches.
@inproceedings{dziedzic2020machine,
title={Machine Learning enabled Spectrum Sharing in Dense LTE-U/Wi-Fi Coexistence Scenarios},
author={Dziedzic, Adam and Sathya, Vanlin and Rochman, Muhammad and Ghosh, Monisha and Krishnan, Sanjay},
booktitle={},
year={2020}
}
-
Input and Model Compression for Adaptive and Robust Neural Networks
@inproceedings{dziedzic2020input,
title={Input and Model Compression for Adaptive and Robust Neural Networks},
author={Dziedzic, Adam},
booktitle={},
year={2020}
}