Actually, the earliest paper that solves the distinct_n estimation
problem in 1 pass is the following:
"Estimating simple functions on the union of data streams"
by Gibbons and Tirthapura, SPAA 2001.
http://home.eng.iastate.edu/~snt/research/streaming.pdf
The above paper addresses
Hi everybody!
Perhaps the following papers are relevant to the discussion here
(their contact authors have been cc'd):
1. The following proposes effective algorithms for using block-level
sampling for n_distinct estimation:
"Effective use of block-level sampling in statistics estimat