Collision-based Testers are Optimal for Uniformity and Closeness

Ilias Diakonikolas; Themis Gouleakis; John Peebles; Eric Price

Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > DETAIL:

Paper:

TR16-178 | 11th November 2016 04:51

Collision-based Testers are Optimal for Uniformity and Closeness

TR16-178 Authors: Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price
Publication: 12th November 2016 09:34
Downloads: 3361

Keywords:

Abstract:

We study the fundamental problems of (i) uniformity testing of a discrete distribution,
and (ii) closeness testing between two discrete distributions with bounded \ell_2-norm.
These problems have been extensively studied in distribution testing
and sample-optimal estimators are known for them~\cite{Paninski:08, CDVV14, VV14, DKN:15}.

In this work, we show that the original collision-based testers proposed for these problems
~\cite{GRdist:00, BFR+:00} are sample-optimal, up to constant factors.
Previous analyses showed sample complexity upper bounds for these testers that are optimal
as a function of the domain size n, but suboptimal by polynomial factors
in the error parameter \epsilon. Our main contribution is a new tight analysis
establishing that these collision-based testers are information-theoretically optimal,
up to constant factors, both in the dependence on n and in the dependence on \epsilon.

Comment(s):

Comment #1 to TR16-178 | 6th September 2017 16:30

On the optimal analysis of the collision probability tester

Comment #1 Authors: Oded Goldreich
Accepted on: 6th September 2017 16:30
Downloads: 1804

Keywords:

Abstract:

The collision probability tester, introduced by Goldreich and Ron ({\em ECCC}, TR00-020, 2000),
distinguishes the uniform distribution over [n] from any distribution that is \e-far from this distribution using \poly(1/\epilon)\cdot{\sqrt n} samples.
While the original analysis established only an
upper bound of O(1/\epsilon)^4\cdot{\sqrt n} on the sample complexity, a recent analysis of Diakonikolas, Gouleakis, Peebles, and Price
({\em ECCC}, TR16-178, 2016) established the optimal upper bound of O(1/\epsilon)^2\cdot{\sqrt n}.
In this note we survey their analysis,
while highlighting the sources of improvement. Specifically:
\begin{itemize}
\item
While the original analysis reduces the testing problem to approximating the collision probability of the unknown distribution up to a 1+\epsilon^2 factor, the improved analysis capitalizes on the fact that the latter problem
needs only be solved ``at the extreme'' (i.e., it suffices to distinguish the uniform distribution, which has collision probability 1/n, from any distribution that has collision probability exceeding (1+4\epsilon^2)/n).
\item
While the original analysis provides an almost
optimal analysis of the variance of the estimator when \epsilon=\Omega(1), a more careful analysis yields a significantly better bound for the case of \epsilon=o(1), which is the case that is relevant here.
\end{itemize}

ISSN 1433-8092 | Imprint