Loading jsMath...
Weizmann Logo
ECCC
Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

Login | Register | Classic Style



REPORTS > DETAIL:

Paper:

TR10-180 | 18th November 2010 10:08

Estimating the unseen: A sublinear-sample canonical estimator of distributions

RSS-Feed




TR10-180
Authors: Gregory Valiant, Paul Valiant
Publication: 18th November 2010 16:12
Downloads: 3971
Keywords: 


Abstract:

We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size. Together with the lower bounds proven in the companion paper [29], this settles the longstanding question of the sample complexities of these estimation problems (up to constant factors). Our algorithm estimates these properties up to an arbitrarily small additive constant, using O(n/\log n) samples; [29] shows that no algorithm on o(n/\log n) samples can achieve this (where n is a bound on the support size, or in the case of estimating the support size, 1/n is a lower bound the probability of any element of the domain). Previously, no explicit sublinear-sample algorithms for either of these problems were known.

Additionally, our algorithm runs in time linear in the number of samples used.



ISSN 1433-8092 | Imprint