Testing probability distributions using conditional samples

Clement Canonne; Dana Ron; Rocco Servedio

Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > DETAIL:

Revision(s):

Revision #1 to TR12-155 | 16th January 2015 19:19

Testing probability distributions using conditional samples

Revision #1 Authors: Clement Canonne, Dana Ron, Rocco Servedio
Accepted on: 16th January 2015 19:19
Downloads: 1104

Keywords:

Abstract:

We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. \footnote{Independently from our work, Chakraborty et al. [CFGM13] also considered this framework. We discuss their work in Subsection 1.4.} This is an oracle that takes as input a subset $S \subseteq [N]$ of the domain $[N]$ of the unknown probability distribution $D$ and returns a draw from the conditional probability distribution $D$ restricted to $S$. This new model allows considerable flexibility in the design of distribution testing algorithms; in particular, testing algorithms in this model can be adaptive.

We study a wide range of natural distribution testing problems in this new framework and some of its variants, giving both upper and lower bounds on query complexity. These problems include testing whether $D$ is the uniform distribution U; testing whether $D = D*$ for an explicitly provided $D*$; testing whether two unknown distributions $D_1$ and $D_2$ are equivalent; and estimating the variation distance between $D$ and the uniform distribution. At a high level our main finding is that the new conditional sampling framework we consider is a powerful one: while all the problems mentioned above have $\Omega(\sqrt{N})$ sample complexity in the standard model (and in some cases the complexity must be almost linear in $N$), we give $\mathrm{poly}(\log N,1/\varepsilon)$-query algorithms (and in some cases $\mathrm{poly}(1/\varepsilon)$-query algorithms independent of $N$) for all these problems in our conditional sampling setting.

Changes to previous version:

Significant changes on Section 9 (detailing and expanding the proof of Theorem 16, uniformity lower bound again adaptive testers). Several clarifications added and typos fixed in various places.

Paper:

TR12-155 | 15th November 2012 03:11

Testing probability distributions using conditional samples

TR12-155 Authors: Clement Canonne, Dana Ron, Rocco Servedio
Publication: 15th November 2012 03:11
Downloads: 2913

Keywords:

Property testing, probability distributions,

Abstract:

We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. \footnote{Independently from our work, Chakraborty et al. [CFGM13] also considered this framework. We discuss their work in Subsection 1.4.} This is an oracle that takes as input a subset S \subseteq [N] of the domain [N] of the unknown probability distribution D and returns a draw from the conditional probability distribution D restricted to S. This new model allows considerable flexibility in the design of distribution testing algorithms;
in particular, testing algorithms in this model can be adaptive.

We study a wide range of natural distribution testing problems in this new framework and some of its variants, giving both upper and lower bounds on query complexity. These problems include testing whether D is the uniform distribution U; testing whether D = D* for an explicitly provided D*; testing whether two unknown distributions D1 and D2 are equivalent; and estimating the variation distance between D and the uniform distribution. At a high level our main finding is that the new conditional sampling framework we consider is a powerful one: while all the problems mentioned above have Omega(\sqrt{N}) sample complexity in the standard model (and in some cases the complexity must be almost linear in N), we give poly(log(N),1/\epsilon)) query algorithms (and in some cases poly(1/\epsilon)-query algorithms independent of N) for all these problems in our conditional sampling setting.

ISSN 1433-8092 | Imprint