Ke Han created CASSANDRA-18636:
----------------------------------

             Summary: AssertionError: Unknown keyspace after dropping a keyspace
                 Key: CASSANDRA-18636
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18636
             Project: Cassandra
          Issue Type: Bug
            Reporter: Ke Han


When running Cassandra cluster (3 nodes) in 3.11.15, I noticed that there is an 
ERROR log message in system.log after I executed the drop keyspace command. 

 
{code:java}
INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:48,575 
MigrationManager.java:205 - Create new Keyspace: 
KeyspaceMetadata{name=uuide1b80964f06a424c8a57884564127745, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=2}}, tables=[], views=[], functions=[], types=[]}
INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:53,106 
MigrationManager.java:205 - Create new Keyspace: 
KeyspaceMetadata{name=uuidd17c2b7bfea545d0801a3c05e0e5307a, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
INFO  [Native-Transport-Requests-2] 2023-06-26 06:11:56,673 
MigrationManager.java:235 - Create new table: 
org.apache.cassandra.config.CFMetaData@705b690b[cfId=5a7fcf10-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=vzkzguyh,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=50.0ms, caching={'keys' : 'ALL', 
'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
[condcduzca o ukwhjwbdljqnz 
vzkzguyh]],partitionKeyColumns=[gveohmecfdwx],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[gveohmecfdwx,
 condcduzca, vzkzguyh, o, 
ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-06-26 06:12:03,118 ColumnFamilyStore.java:432 - 
Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
INFO  [Native-Transport-Requests-9] 2023-06-26 06:12:04,634 
MigrationManager.java:235 - Create new table: 
org.apache.cassandra.config.CFMetaData@61f10dd7[cfId=5f3e8fa0-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=o,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=ALWAYS, caching={'keys' : 'ALL', 
'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
[dzwqnsv gveohmecfdwx o 
ukwhjwbdljqnz]],partitionKeyColumns=[vzkzguyh],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[dzwqnsv,
 vzkzguyh, gveohmecfdwx, o, 
ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-06-26 06:12:11,862 ColumnFamilyStore.java:432 - 
Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.o
INFO  [Native-Transport-Requests-6] 2023-06-26 06:12:13,165 
MigrationManager.java:205 - Create new Keyspace: 
KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
INFO  [MigrationStage:1] 2023-06-26 06:12:20,413 MigrationCoordinator.java:531 
- Sending schema pull request to /192.168.103.3 at 1687759940413 with timeout 
4287
INFO  [Native-Transport-Requests-3] 2023-06-26 06:12:21,270 
MigrationManager.java:286 - Update Keyspace 
'uuid1602df26aef747d6af8f6d0c36fb1a44' From 
KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=1}}, tables=[], views=[], functions=[], 
types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
 To KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=3}}, tables=[], views=[], functions=[], 
types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
INFO  [Native-Transport-Requests-1] 2023-06-26 06:12:24,646 
MigrationManager.java:235 - Create new table: 
org.apache.cassandra.config.CFMetaData@132ca1e[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=40PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
 | 
[xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
 jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-06-26 06:12:31,863 ColumnFamilyStore.java:432 - 
Initializing uuid1602df26aef747d6af8f6d0c36fb1a44.q
INFO  [MutationStage-5] 2023-06-26 06:12:32,571 ColumnFamilyStore.java:2250 - 
Truncating uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
INFO  [MutationStage-5] 2023-06-26 06:12:35,128 ColumnFamilyStore.java:2304 - 
Truncate of uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh is complete
INFO  [MutationStage-4] 2023-06-26 06:12:35,134 ColumnFamilyStore.java:2250 - 
Truncating uuid1602df26aef747d6af8f6d0c36fb1a44.q
INFO  [MutationStage-4] 2023-06-26 06:12:37,524 ColumnFamilyStore.java:2304 - 
Truncate of uuid1602df26aef747d6af8f6d0c36fb1a44.q is complete
INFO  [Native-Transport-Requests-4] 2023-06-26 06:12:37,529 
MigrationManager.java:362 - Drop Keyspace 'uuidd17c2b7bfea545d0801a3c05e0e5307a'
INFO  [Native-Transport-Requests-1] 2023-06-26 06:13:07,622 
MigrationManager.java:313 - Update table 
'uuid1602df26aef747d6af8f6d0c36fb1a44/q' From 
org.apache.cassandra.config.CFMetaData@602b98fd[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=40PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={max_threshold=32, min_threshold=4}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
 | 
[xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
 jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]] To 
org.apache.cassandra.config.CFMetaData@cfd4a9a[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=40PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={max_threshold=32, min_threshold=4}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
 | 
[]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[jsxcfdobqbmirwxs,
 fkbelwg],droppedColumns={java.nio.HeapByteBuffer[pos=0 lim=8 
cap=8]=DroppedColumn{name=xwtypuqy, kind=REGULAR, 
type=org.apache.cassandra.db.marshal.Int32Type, 
droppedTime=1687759987622000}},triggers=[],indexes=[]]
ERROR [OptionalTasks:1] 2023-06-26 06:13:14,070 CassandraDaemon.java:244 - 
Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.AssertionError: Unknown keyspace uuidd17c2b7bfea545d0801a3c05e0e5307a
        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
        at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
        at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
        at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
        at 
com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
        at 
org.apache.cassandra.db.SizeEstimatesRecorder.run(SizeEstimatesRecorder.java:72)
        at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
        at java.lang.Thread.run(Thread.java:750)
{code}
 

>From my understanding, this should be a concurrency bug. The ERROR message 
>comes from *_SizeEstimatesRecorder_* thread when it tries to init and iterate 
>the keyspaces to get the size information. This thread might be invoked from 
>time to time, controlled by DebuggableScheduledThreadPoolExecutor.

The error message occurs when there's a thread interleaving between
 # Thread1: *_SizeEstimatesRecorder_* thread and
 # Thread2: Thread performing {_}drop keyspace{_}. Probably a mutation or 
migration thread.

In short, thread1 fetches the keyspace names => thread2 updates the schema by 
removing a specific keyspace => thread1 tries to fetch the schema from the 
dropped keyspace, and error message is shown.

 

More specifically, here's how it happens from my preliminary analysis:

(1) Thread1 tries to iterate all the keyspaces from

 
{code:java}
public void run()
{
    TokenMetadata metadata = 
StorageService.instance.getTokenMetadata().cloneOnlyTokenMap();
    if (!metadata.isMember(FBUtilities.getBroadcastAddress()))
    {
        logger.debug("Node is not part of the ring; not recording size 
estimates");
        return;
    }    logger.trace("Recording size estimates");    for (Keyspace keyspace : 
Keyspace.nonLocalStrategy())
    {{code}
It invokes _Schema.instance.getNonLocalStrategyKeyspaces()_ in 
_nonLocalStrategy_ function

 

 
{code:java}
    public static Iterable<Keyspace> nonLocalStrategy()    {        return 
Iterables.transform(Schema.instance.getNonLocalStrategyKeyspaces(), 
keyspaceTransformer);    } {code}
getNonLocalStrategyKeyspaces returns a list of *keyspace names* from current 
schema.

 
{code:java}
/**
 * @return a collection of keyspaces that do not use LocalStrategy for 
replication
 */
public List<String> getNonLocalStrategyKeyspaces()
{
    return keyspaces.values().stream()
            .filter(keyspace -> keyspace.params.replication.klass != 
LocalStrategy.class)
            .map(keyspace -> keyspace.name)
            .collect(Collectors.toList());
} {code}
 

(2) Thread2 performs the _drop keyspace_ and {_}updates the schema{_}. 

(3) Thread1: for each keyspace name, it will invoke the <init> function of 
Keyspace, when it tries to fetch the metadata from the dropped keyspace name, 
it gets null and throws and ERROR exception.

 
{code:java}
private Keyspace(String keyspaceName, boolean loadSSTables)
{
    metadata = Schema.instance.getKSMetaData(keyspaceName);
    assert metadata != null : "Unknown keyspace " + keyspaceName;
    createReplicationStrategy(metadata); {code}
 

 

The error message won't stop Cassandra from running. But I think this could be 
handled considering its ERROR level. E.g. if a user wants to perform a 
migration from 3.x to 4.x, [Cassandra migration 
doc|https://www.datastax.com/learn/whats-new-for-cassandra-4/migrating-cassandra-4x]
 mentions that "All errors need to be handled". This error message could 
confuse users about whether the data is still in an integrated state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to