How large is the data frame and what data type are you counting distinct for? I use count distinct quite a bit and haven't noticed any thing peculiar.
Also, which exact version in 2.3.x? And, are performing any operations on the DF before the countDistinct? I recall there was a bug when I did countDistinct(PythonUDF(x)) in the same query which was resolved in one of the minor versions in 2.3.x On Sat, Jun 29, 2019, 10:32 Rishi Shah <rishishah.s...@gmail.com> wrote: > Hi All, > > Just wanted to check in to see if anyone has any insight about this > behavior. Any pointers would help. > > Thanks, > Rishi > > On Fri, Jun 14, 2019 at 7:05 AM Rishi Shah <rishishah.s...@gmail.com> > wrote: > >> Hi All, >> >> Recently we noticed that countDistinct on a larger dataframe doesn't >> always return the same value. Any idea? If this is the case then what is >> the difference between countDistinct & approx_count_distinct? >> >> -- >> Regards, >> >> Rishi Shah >> > > > -- > Regards, > > Rishi Shah >