[PERFORM] Citation for "Bad n_distinct estimation; hacks suggested?"

2005-05-02 Thread Gurmeet Manku
Actually, the earliest paper that solves the distinct_n estimation problem in 1 pass is the following: "Estimating simple functions on the union of data streams" by Gibbons and Tirthapura, SPAA 2001. http://home.eng.iastate.edu/~snt/research/streaming.pdf The above paper addresses

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Gurmeet Manku
Hi everybody! Perhaps the following papers are relevant to the discussion here (their contact authors have been cc'd): 1. The following proposes effective algorithms for using block-level sampling for n_distinct estimation: "Effective use of block-level sampling in statistics estimat