I agree that there should be more clear doc on exactly how the estimation
is calculated. When I inquired about this recently the response was that it
should be within about 2% of the actual key count. I started looking at the
code, but I ran out of time before I chased down all the subsidiary factors
in the calculation.

It would be nice to have an explicit nodetool option to count actual keys.
Presumably that would be more efficient than a select count(*).


-- Jack Krupansky

On Fri, Jan 29, 2016 at 11:27 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> Why in cqlsh when I query "select count(*) from mordor.things_values_meta
> ;" it says: 4692
>
> But in nodetool cfstats it says Number of keys (estimate): 4720?
>
> On 29 January 2016 at 16:25, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> I am counting the rows with "select count(*) from
>> mordor.things_values_meta;"
>>
>> I am doing one node cluster to one node cluster for testing.
>>
>> On 29 January 2016 at 16:20, Jack Krupansky <jack.krupan...@gmail.com>
>> wrote:
>>
>>> And how are you counting the rows? With a query? If, so, what is the
>>> query. Using nodetool cfstats (estimated) key count? Or... what?
>>>
>>> Are the tokens for the missing rows is the same range and a distinct
>>> range from the rest of the data in the original cluster?
>>>
>>> How many nodes in the original cluster?
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> wrote:
>>>
>>>> I will check the output of nodetool cfstats.
>>>>
>>>> Its from version 2.1.2 to version 2.1.9.
>>>>
>>>> On 29 January 2016 at 16:02, Jack Krupansky <jack.krupan...@gmail.com>
>>>> wrote:
>>>>
>>>>> Are these sstables from an existing Cassandra cluster or generated by
>>>>> a program?
>>>>>
>>>>> If the former, do a nodetool tablestats or cfstats to get the sstable
>>>>> count and compare it to both the number of sstables that the loader is
>>>>> reading from and the number that end up in the target cluster.
>>>>>
>>>>> What Cassandra version did the sstables come from and what version are
>>>>> you importing into?
>>>>>
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
>>>>> arindam.choudh...@ackstorm.com> wrote:
>>>>>
>>>>>> Hi Romain,
>>>>>>
>>>>>> The RF was set to 2.
>>>>>>
>>>>>> I changed it to one.
>>>>>>
>>>>>>  CREATE KEYSPACE mordor WITH replication = {'class' :
>>>>>> 'SimpleStrategy', 'replication_factor' : 1}  AND durable_writes = true;
>>>>>>
>>>>>> re-inserted the columns, still missing rows.
>>>>>>
>>>>>> Regards,
>>>>>> Arindam
>>>>>>
>>>>>> On 29 January 2016 at 15:14, Romain Hardouin <romainh...@yahoo.fr>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I assume a RF > 1. Right?
>>>>>>> What is the consistency level you used? cqlsh use ONE by default.
>>>>>>> Try:
>>>>>>> cqlsh> CONSISTENCY ALL
>>>>>>> And run your query again.
>>>>>>>
>>>>>>> Best,
>>>>>>> Romain
>>>>>>>
>>>>>>>
>>>>>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>>>>>> arindam.choudh...@ackstorm.com> a écrit :
>>>>>>>
>>>>>>>
>>>>>>> Hi Kai,
>>>>>>>
>>>>>>> The table schema is:
>>>>>>>
>>>>>>> CREATE TABLE mordor.things_values_meta (
>>>>>>>     thing_id text,
>>>>>>>     key text,
>>>>>>>     bucket_timestamp timestamp,
>>>>>>>     total_rows counter,
>>>>>>>     PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>>>>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>>>>>>     AND comment = ''
>>>>>>>     AND compaction = {'min_threshold': '4', 'class':
>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>> 'max_threshold': '32'}
>>>>>>>     AND compression = {'sstable_compression':
>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>     AND default_time_to_live = 0
>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>     AND max_index_interval = 2048
>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>     AND min_index_interval = 128
>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>     AND speculative_retry = '99.0PERCENTILE';
>>>>>>>
>>>>>>>
>>>>>>> I am just running "select count(*) from things_values_meta ;" to get
>>>>>>> the count.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Arindam
>>>>>>>
>>>>>>> On 29 January 2016 at 13:39, Kai Wang <dep...@gmail.com> wrote:
>>>>>>>
>>>>>>> Arindam,
>>>>>>>
>>>>>>> what's the table schema and what does your query to retrieve the
>>>>>>> rows look like?
>>>>>>>
>>>>>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>>>>>> arindam.choudh...@ackstorm.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am importing data to a new cassandra cluster using sstableloader.
>>>>>>> The sstableloader runs without any warning or error. But I am missing
>>>>>>> around 1000 rows.
>>>>>>>
>>>>>>> Any feedback will be highly appreciated.
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>> Arindam Choudhury
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to