ECCC
Electronic Colloquium on Computational Complexity
Login | Register | Classic Style



REPORTS > DETAIL:

Revision(s):

Revision #2 to TR10-017 | 7th January 2011 02:17

PCPs and the Hardness of Generating Private Synthetic Data

RSS-Feed

Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$\poly(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.


Revision #1 to TR10-017 | 6th November 2010 04:43

PCPs and the Hardness of Generating Synthetic Data


Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$poly(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.

We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.


Paper:

TR10-017 | 10th February 2010 21:08

PCPs and the Hardness of Generating Synthetic Data


Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$\poly(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.



ISSN 1433-8092 | Imprint