Hi, all: I have noticed that when using an unqualified COUNT(*) call on a table (whether Phoenix or view of an existing HBase table) without a qualifying WHERE clause, it comes back very quickly and, usually, inaccurately. I assume it is being cached somewhere as metadata, but I am wondering if anything can be done to force it to become up to date.
Here is a scenario I've recently encountered: I've imported ~50,000,000 rows into a Phoenix table, let's call it A, from a CSV file via MapReduce. When I run this query: SELECT COUNT(*) FROM A; It yields (almost immediately) 0 rows. However, I know there is data in there, which I can verify by running this query: SELECT * FROM A LIMIT 1; which returns a row with data in it. I am wondering if there is a mechanism by which I can obtain a truly accurate count, and how COUNT(*) actually works, since it doesn't appear to work as I would expect. I tried running this: UPDATE STATSTICS A; but that did not seem to have an effect. I have seen that if I change it to, say, COUNT(DISTINCT foo), or add a WHERE clause, then it appears to do a full scan in some cases, but I have also seen this result in a 0 count as well (I have not been able to discern a pattern as to which happens when). I would also be interested in being able to get a quicker, estimated count, similar to what can be done in Postgres by looking at the live tuples that are listed as being present in the table statistics (much quicker than doing a COUNT and serves as a reasonable ballpark for row count). I searched the mailing list archive and Googled around, and didn't come across anything in terms of unexpected COUNT behavior. Has anyone else seen this behavior? Any thoughts on how to get an accurate count? Thanks, - Ken