Submitted to the NeurIPS 2026 Competition Track

Aditya Kumar1, Dariush Wahdany1, Ossi Räisä1, Ivo Hoese1, Hoyong Jeong1,
Daniil Filienko2, Jonas Böhler3, Jesse C. Cresswell4, Franziska Boenisch1, Adam Dziedzic1

1CISPA Helmholtz Center for Information Security   2University of Washington Tacoma   3SAP SE   4Layer 6 AI

GitHub  ·  Contact  ·  Discord (coming soon)

Tabular foundation models (TFMs) such as TabPFN solve a new tabular task at inference time by conditioning on a training set provided in context, without any gradient updates. That in-context training set is private data, and it raises a question we do not yet have good answers to: how much does a TFM leak about the records it was conditioned on? This competition turns that question into a shared benchmark.

The Four Tracks

Four tracks run under a shared evaluation harness — same target TFMs, same in-context construction, same data splits.

  • Track 1 — Attribute Inference. Recover a held-out sensitive attribute of a known target record from black-box queries.
  • Track 2 — Membership Inference. Decide whether a candidate record was part of the hidden in-context training set.
  • Track 3 — Dataset Inference. Decide whether an entire suspect corpus was used as the in-context training set.
  • Track 4 — Property Inference. Recover a global property of the training distribution under a two-world setup (70% vs 50%).

Timeline

Date Milestone
15 June 2026 NeurIPS competition acceptance notification.
29 June 2026 Starting kit released (datasets, baselines, evaluation script, Docker image, agent harness).
6 July 2026 Starting kit frozen.
20 July 2026 Development phase opens: registration available, all four leaderboards live.
31 October 2026 Development phase closes; held-out evaluation begins.
7 November 2026 Final per-track and overall leaderboards published.
21 November 2026 Invited methods reports from top three teams per track due; top three release code.
Early December 2026 In-person presentation and panel at NeurIPS 2026.
March 2027 Post-competition analysis paper submitted.

How to Participate

Registration opens with the development phase on 20 July 2026. The pipeline runs on CPU; stronger attacks based on shadow models or supervised aggregation are feasible on a single consumer GPU.

  1. Register on the competition platform and agree to the rules and data use agreement.
  2. Receive your API token — used to query the target TFMs and to submit predictions. Tokens are per-team and rate-limited (one submission per track every five minutes, daily cap of five per track).
  3. Download the starting kit from the GitHub repository: dataset loaders, the four reference baselines, a local evaluation script that scores predictions on the public split with the same scoring function as the server, a sample submission generator, and a Docker image that reproduces the full pipeline.
  4. Build your attack against any subset of the four tracks. You may form teams of up to five members and participate in multiple tracks (one team per track).
  5. Submit a CSV of predictions through the platform’s submission endpoint. The evaluation server scores it against the public split and returns the public score immediately. The held-out score is only computed and displayed after the competition closes.

Final ranking is computed on the held-out 70% split, never released during the competition. The overall ranking across all four tracks is the mean of min-max-normalized per-track scores; teams that do not submit on a track receive 0 on that track for the overall ranking.

Top three teams per track receive an invitation to present at the NeurIPS competition track workshop, co-authorship on the post-competition analysis paper, and travel awards (contingent on sponsor support). Top three per track must release their code under a permissive open-source license.

Tutorial

A written tutorial walks through the four reference baselines (one per track) and explains how to query the target TFMs through the platform’s API. It covers: loading the public split, generating shadow in-context sets for the LiRA-style membership baseline, running the per-record loss aggregation for dataset inference, and training the black-box meta-classifier for property inference. A short white paper describes the threat models, evaluation methodology, and expected significance.

The tutorial and the starting kit are released together on 29 June 2026 and frozen on 6 July 2026, so participants build against stable code.

FAQ

Who can participate?

Anyone. Teams may have up to five members. A person may participate in at most one team per track but may participate in multiple tracks.

What do I submit?

A CSV file of predictions through the platform’s submission endpoint. Track 1 (attribute) and Track 2 (membership) take per-record probabilities; Track 3 (dataset) takes a probability per (target model, candidate corpus) pair; Track 4 (property) takes a probability per target model that it was trained under World A.

How are submissions evaluated?

All four tracks are scored on TPR @ 1% FPR. The evaluation server returns the public score on the 30% public split immediately. The held-out score on the remaining 70% is computed only after the competition closes and determines the final ranking. Scoring is deterministic and reproducible from the submitted CSV alone.

Are there submission limits?

One submission per track every five minutes, with a daily cap of five submissions per track. Only your best public score per track counts during the live phase.

What hardware do I need?

The submission and querying pipeline runs on CPU. Stronger attacks based on shadow models or supervised aggregation are feasible on a single consumer GPU. Compute support is available for participants from under-resourced research groups via partner programs — contact us if this applies.

Is the data real?

No. Participants only ever see synthetic tabular records sampled from a TabSyn generator trained on a public reference distribution (Folktables, UCI Adult, or Texas hospital discharge). The real records of the public datasets are seen only by the organizers, for TabSyn training and offline utility/fidelity evaluation. The held-out test split and per-model ground-truth labels are kept on organizers’ servers and never released.

What target models do I attack?

Tabular foundation models (TFMs) in the in-context regime: TabPFN, NanoTabPFN, and related backbones. You are told which TFM family backs each track (e.g., TabPFN v3); the hidden in-context training set and any per-model randomness are not exposed.

Do I have to release my code?

Only if you finish in the top three on any track. Top-three teams per track release their code under a permissive open-source license so the organizers can re-run entries in case of suspected leakage. If your team enters the top ten on any leaderboard, you must submit a short methods description (200–1000 words) within one week of first entering the top ten.

Are there prizes?

Top three teams per track receive: an invitation to present at the NeurIPS competition track workshop; co-authorship on the post-competition analysis paper as named contributors; and travel awards for the top team per track contingent on sponsor support. We are pursuing monetary prizes through CISPA and partner sponsorship.

How do I ask a question or report an issue?

The public Discord server is the primary channel for questions, announcements, and strategy discussion (link coming with the starting kit on 29 June 2026). For private inquiries — registration issues, accessibility requests, disputes — email privacy-tabular-fm@cispa.de. Rule updates and deadline extensions, if any, are posted on this site, announced on Discord, and emailed to the participant list.

What if I find a bug in the evaluation infrastructure?

Email the organizers immediately. The organizers reserve the right to disqualify any submission that demonstrably exploits a flaw in the evaluation infrastructure. Good-faith reports of issues are welcome and credited.

Get Involved

The starting kit and registration open 20 July 2026. Until then, follow updates on the GitHub repository or reach out by email at privacy-tabular-fm@cispa.de.

Organized by the SprintML Lab at CISPA Helmholtz Center for Information Security, with collaborators at the University of Washington Tacoma, SAP SE, and Layer 6 AI.