Hi Yuji and Ben, I tried out this revised script and the same issue occurred to me, too. I think it's definitely a bug to be solved asap.
>Ben What do you mean "an undocumented limitation" ? Thanks, Hiro On Sat, Nov 26, 2016 at 3:13 PM, Ben Slater <ben.sla...@instaclustr.com> wrote: > Nice detective work! Seems to me that it’s a best an undocumented limitation > and potentially could be viewed as a bug - maybe log another JIRA? > > One node - there is a nodetool truncatehints command that could be used to > clear out the hints > (http://cassandra.apache.org/doc/latest/tools/nodetool/truncatehints.html?highlight=truncate) > . However, it seems to clear all hints on particular endpoint, not just for > a specific table. > > Cheers > Ben > > On Fri, 25 Nov 2016 at 17:42 Yuji Ito <y...@imagine-orb.com> wrote: >> >> Hi all, >> >> I revised the script to reproduce the issue. >> I think the issue happens more frequently than before. >> Killing another node is added to the previous script. >> >> ==== [script] ==== >> #!/bin/sh >> >> node1_ip=<node1 IP address> >> node2_ip=<node2 IP address> >> node3_ip=<node3 IP address> >> node2_user=<user name> >> node3_user=<user name> >> rows=10000 >> >> echo "consistency quorum;" > init_data.cql >> for key in $(seq 0 $(expr $rows - 1)) >> do >> echo "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT >> EXISTS;" >> init_data.cql >> done >> >> while true >> do >> echo "truncate the table" >> cqlsh $node1_ip -e "truncate table testdb.testtbl" > /dev/null 2>&1 >> if [ $? -ne 0 ]; then >> echo "truncating failed" >> continue >> else >> break >> fi >> done >> >> echo "kill C* process on node3" >> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon | >> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" >> >> echo "insert $rows rows" >> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1 >> >> echo "restart C* process on node3" >> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start" >> >> while true >> do >> echo "truncate the table again" >> cqlsh $node1_ip -e "truncate table testdb.testtbl" >> if [ $? -ne 0 ]; then >> echo "truncating failed" >> continue >> else >> echo "truncation succeeded!" >> break >> fi >> done >> >> echo "kill C* process on node2" >> pdsh -l $node2_user -R ssh -w $node2_ip "ps auxww | grep CassandraDaemon | >> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" >> >> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select >> count(*) from testdb.testtbl;" >> sleep 10 >> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select >> count(*) from testdb.testtbl;" >> >> echo "restart C* process on node2" >> pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cassandra start" >> >> >> Thanks, >> yuji >> >> >> On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito <y...@imagine-orb.com> wrote: >>> >>> I investigated source code and logs of killed node. >>> I guess that unexpected writes are executed when truncation is being >>> executed. >>> >>> Some writes were executed after flush (the first flush) in truncation and >>> these writes could be read. >>> These writes were requested as MUTATION by another node for hinted >>> handoff. >>> Their data was stored to a new memtable and flushed (the second flush) to >>> a new SSTable before snapshot in truncation. >>> So, the truncation discarded only old SSTables, not the new SSTable. >>> That's because ReplayPosition which was used for discarding SSTable was >>> that of the first flush. >>> >>> I copied some parts of log as below. >>> "##" line is my comment. >>> The point is that the ReplayPosition is moved forward by the second >>> flush. >>> It means some writes are executed after the first flush. >>> >>> == log == >>> ## started truncation >>> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:04,612 >>> ColumnFamilyStore.java:2790 - truncating testtbl >>> ## the first flush started before truncation >>> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612 >>> ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%) >>> on-heap, 0 (0%) off-heap >>> INFO [MemtableFlushWriter:1] 2016-11-17 08:36:04,613 Memtable.java:352 - >>> Writing Memtable-testtbl@1863835308(42.625KiB serialized bytes, 2816 ops, >>> 0%/0% of on/off-heap limit) >>> ... >>> DEBUG [MemtableFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:386 - >>> Completed flushing >>> /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-1-big-Data.db >>> (17.651KiB) for commitlog position ReplayPosition(segmentId=1479371760395, >>> position=315867) >>> ## this ReplayPosition was used for discarding SSTables >>> ... >>> TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,022 CommitLog.java:298 - >>> discard completed log segments for ReplayPosition(segmentId=1479371760395, >>> position=315867), table 562848f0-a556-11e6-8b14-51065d58fdfb >>> ## end of the first flush >>> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028 >>> ColumnFamilyStore.java:2823 - Discarding sstable data for truncated CF + >>> indexes >>> ## the second flush before snapshot >>> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028 >>> ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 698880 (0%) >>> on-heap, 0 (0%) off-heap >>> INFO [MemtableFlushWriter:2] 2016-11-17 08:36:05,029 Memtable.java:352 - >>> Writing Memtable-testtbl@1186728207(50.375KiB serialized bytes, 3328 ops, >>> 0%/0% of on/off-heap limit) >>> ... >>> DEBUG [MemtableFlushWriter:2] 2016-11-17 08:36:05,258 Memtable.java:386 - >>> Completed flushing >>> /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-2-big-Data.db >>> (17.696KiB) for commitlog position ReplayPosition(segmentId=1479371760395, >>> position=486627) >>> ... >>> TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,289 CommitLog.java:298 - >>> discard completed log segments for ReplayPosition(segmentId=1479371760395, >>> position=486627), table 562848f0-a556-11e6-8b14-51065d58fdfb >>> ## end of the second flush: position was moved >>> ... >>> ## only old SSTable was deleted because this SSTable was older than >>> ReplayPosition(segmentId=1479371760395, position=315867) >>> TRACE [NonPeriodicTasks:1] 2016-11-17 08:36:05,303 SSTable.java:118 - >>> Deleted >>> /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/lb-1-big >>> ... >>> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 >>> ColumnFamilyStore.java:2841 - truncate complete >>> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 >>> TruncateVerbHandler.java:53 - Truncation(keyspace='testdb', cf='testtbl') >>> applied. Enqueuing response to 36512@/10.91.145.7 >>> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 >>> MessagingService.java:728 - /10.91.145.27 sending REQUEST_RESPONSE to >>> 36512@/10.91.145.7 >>> ## end of truncation >>> ==== >>> >>> Actually, "truncated_at" of the table on the system.local after running >>> the script was 0x00000158716da30b0004d1db00000158716db524. >>> It means segmentId=1479371760395, position=315867 >>> truncated_at=1479371765028 (2016-11-17 08:36:05,028) >>> >>> thanks, >>> yuji >>> >>> >>> On Wed, Nov 16, 2016 at 5:25 PM, Yuji Ito <y...@imagine-orb.com> wrote: >>>> >>>> Hi, >>>> >>>> I could find stale data after truncating a table. >>>> It seems that truncating starts while recovery is being executed just >>>> after a node restarts. >>>> After the truncating finishes, recovery still continues? >>>> Is it expected? >>>> >>>> I use C* 2.2.8 and can reproduce it as below. >>>> >>>> ==== [create table] ==== >>>> cqlsh $ip -e "drop keyspace testdb;" >>>> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class': >>>> 'SimpleStrategy', 'replication_factor': '3'};" >>>> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val >>>> int);" >>>> >>>> ==== [script] ==== >>>> #!/bin/sh >>>> >>>> node1_ip=<node1 IP address> >>>> node2_ip=<node2 IP address> >>>> node3_ip=<node3 IP address> >>>> node3_user=<user name> >>>> rows=10000 >>>> >>>> echo "consistency quorum;" > init_data.cql >>>> for key in $(seq 0 $(expr $rows - 1)) >>>> do >>>> echo "insert into testdb.testtbl (key, val) values($key, 1111) IF >>>> NOT EXISTS;" >> init_data.cql >>>> done >>>> >>>> while true >>>> do >>>> echo "truncate the table" >>>> cqlsh $node1_ip -e "truncate table testdb.testtbl" >>>> if [ $? -ne 0 ]; then >>>> echo "truncating failed" >>>> continue >>>> else >>>> break >>>> fi >>>> done >>>> >>>> echo "kill C* process on node3" >>>> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon >>>> | awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" >>>> >>>> echo "insert $rows rows" >>>> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1 >>>> >>>> echo "restart C* process on node3" >>>> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra >>>> start" >>>> >>>> while true >>>> do >>>> echo "truncate the table again" >>>> cqlsh $node1_ip -e "truncate table testdb.testtbl" >>>> if [ $? -ne 0 ]; then >>>> echo "truncating failed" >>>> continue >>>> else >>>> break >>>> fi >>>> done >>>> >>>> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select >>>> count(*) from testdb.testtbl;" >>>> sleep 10 >>>> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select >>>> count(*) from testdb.testtbl;" >>>> >>>> >>>> ==== [result] ==== >>>> truncate the table >>>> kill C* process on node3 >>>> insert 10000 rows >>>> restart C* process on node3 >>>> 10.91.145.27: Starting Cassandra: OK >>>> truncate the table again >>>> <stdin>:1:TruncateError: Error during truncate: Cannot achieve >>>> consistency level ALL >>>> truncating failed >>>> truncate the table again >>>> <stdin>:1:TruncateError: Error during truncate: Cannot achieve >>>> consistency level ALL >>>> truncating failed >>>> truncate the table again >>>> <stdin>:1:TruncateError: Error during truncate: Cannot achieve >>>> consistency level ALL >>>> truncating failed >>>> truncate the table again >>>> <stdin>:1:TruncateError: Error during truncate: Cannot achieve >>>> consistency level ALL >>>> truncating failed >>>> truncate the table again >>>> <stdin>:1:TruncateError: Error during truncate: Cannot achieve >>>> consistency level ALL >>>> truncating failed >>>> truncate the table again >>>> <stdin>:1:TruncateError: Error during truncate: Cannot achieve >>>> consistency level ALL >>>> truncating failed >>>> truncate the table again >>>> Consistency level set to SERIAL. >>>> >>>> count >>>> ------- >>>> 300 >>>> >>>> (1 rows) >>>> >>>> Warnings : >>>> Aggregation query used without partition key >>>> >>>> Consistency level set to SERIAL. >>>> >>>> count >>>> ------- >>>> 2304 >>>> >>>> (1 rows) >>>> >>>> Warnings : >>>> Aggregation query used without partition key >>>> ==== >>>> >>>> I found it when I was investigating data lost problem. (Ref. "failure >>>> node rejoin" thread) >>>> I'm not sure this problem is related to data lost. >>>> >>>> Thanks, >>>> yuji >>> >>> >> >