I agree that there should be more clear doc on exactly how the estimation is calculated. When I inquired about this recently the response was that it should be within about 2% of the actual key count. I started looking at the code, but I ran out of time before I chased down all the subsidiary factors in the calculation.
It would be nice to have an explicit nodetool option to count actual keys. Presumably that would be more efficient than a select count(*). -- Jack Krupansky On Fri, Jan 29, 2016 at 11:27 AM, Arindam Choudhury < arindam.choudh...@ackstorm.com> wrote: > Why in cqlsh when I query "select count(*) from mordor.things_values_meta > ;" it says: 4692 > > But in nodetool cfstats it says Number of keys (estimate): 4720? > > On 29 January 2016 at 16:25, Arindam Choudhury < > arindam.choudh...@ackstorm.com> wrote: > >> I am counting the rows with "select count(*) from >> mordor.things_values_meta;" >> >> I am doing one node cluster to one node cluster for testing. >> >> On 29 January 2016 at 16:20, Jack Krupansky <jack.krupan...@gmail.com> >> wrote: >> >>> And how are you counting the rows? With a query? If, so, what is the >>> query. Using nodetool cfstats (estimated) key count? Or... what? >>> >>> Are the tokens for the missing rows is the same range and a distinct >>> range from the rest of the data in the original cluster? >>> >>> How many nodes in the original cluster? >>> >>> -- Jack Krupansky >>> >>> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury < >>> arindam.choudh...@ackstorm.com> wrote: >>> >>>> I will check the output of nodetool cfstats. >>>> >>>> Its from version 2.1.2 to version 2.1.9. >>>> >>>> On 29 January 2016 at 16:02, Jack Krupansky <jack.krupan...@gmail.com> >>>> wrote: >>>> >>>>> Are these sstables from an existing Cassandra cluster or generated by >>>>> a program? >>>>> >>>>> If the former, do a nodetool tablestats or cfstats to get the sstable >>>>> count and compare it to both the number of sstables that the loader is >>>>> reading from and the number that end up in the target cluster. >>>>> >>>>> What Cassandra version did the sstables come from and what version are >>>>> you importing into? >>>>> >>>>> >>>>> -- Jack Krupansky >>>>> >>>>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury < >>>>> arindam.choudh...@ackstorm.com> wrote: >>>>> >>>>>> Hi Romain, >>>>>> >>>>>> The RF was set to 2. >>>>>> >>>>>> I changed it to one. >>>>>> >>>>>> CREATE KEYSPACE mordor WITH replication = {'class' : >>>>>> 'SimpleStrategy', 'replication_factor' : 1} AND durable_writes = true; >>>>>> >>>>>> re-inserted the columns, still missing rows. >>>>>> >>>>>> Regards, >>>>>> Arindam >>>>>> >>>>>> On 29 January 2016 at 15:14, Romain Hardouin <romainh...@yahoo.fr> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I assume a RF > 1. Right? >>>>>>> What is the consistency level you used? cqlsh use ONE by default. >>>>>>> Try: >>>>>>> cqlsh> CONSISTENCY ALL >>>>>>> And run your query again. >>>>>>> >>>>>>> Best, >>>>>>> Romain >>>>>>> >>>>>>> >>>>>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury < >>>>>>> arindam.choudh...@ackstorm.com> a écrit : >>>>>>> >>>>>>> >>>>>>> Hi Kai, >>>>>>> >>>>>>> The table schema is: >>>>>>> >>>>>>> CREATE TABLE mordor.things_values_meta ( >>>>>>> thing_id text, >>>>>>> key text, >>>>>>> bucket_timestamp timestamp, >>>>>>> total_rows counter, >>>>>>> PRIMARY KEY ((thing_id, key), bucket_timestamp) >>>>>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC) >>>>>>> AND bloom_filter_fp_chance = 0.01 >>>>>>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' >>>>>>> AND comment = '' >>>>>>> AND compaction = {'min_threshold': '4', 'class': >>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', >>>>>>> 'max_threshold': '32'} >>>>>>> AND compression = {'sstable_compression': >>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'} >>>>>>> AND dclocal_read_repair_chance = 0.1 >>>>>>> AND default_time_to_live = 0 >>>>>>> AND gc_grace_seconds = 864000 >>>>>>> AND max_index_interval = 2048 >>>>>>> AND memtable_flush_period_in_ms = 0 >>>>>>> AND min_index_interval = 128 >>>>>>> AND read_repair_chance = 0.0 >>>>>>> AND speculative_retry = '99.0PERCENTILE'; >>>>>>> >>>>>>> >>>>>>> I am just running "select count(*) from things_values_meta ;" to get >>>>>>> the count. >>>>>>> >>>>>>> Regards, >>>>>>> Arindam >>>>>>> >>>>>>> On 29 January 2016 at 13:39, Kai Wang <dep...@gmail.com> wrote: >>>>>>> >>>>>>> Arindam, >>>>>>> >>>>>>> what's the table schema and what does your query to retrieve the >>>>>>> rows look like? >>>>>>> >>>>>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury < >>>>>>> arindam.choudh...@ackstorm.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am importing data to a new cassandra cluster using sstableloader. >>>>>>> The sstableloader runs without any warning or error. But I am missing >>>>>>> around 1000 rows. >>>>>>> >>>>>>> Any feedback will be highly appreciated. >>>>>>> >>>>>>> Kind Regards, >>>>>>> Arindam Choudhury >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >