[ https://issues.apache.org/jira/browse/HDDS-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bharat Viswanadham resolved HDDS-3642. -------------------------------------- Fix Version/s: 0.6.0 Resolution: Fixed > Stop/Pause Background services while replacing OM DB with checkpoint from > Leader > -------------------------------------------------------------------------------- > > Key: HDDS-3642 > URL: https://issues.apache.org/jira/browse/HDDS-3642 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: OM HA > Reporter: Hanisha Koneru > Assignee: Hanisha Koneru > Priority: Critical > Labels: pull-request-available > Fix For: 0.6.0 > > > When a follower OM needs to replace its DB with a checkpoint from Leader (to > catch up on the transactions), it should pause or stop services which read/ > write to the DB. > During OM HA testing, found that OM could crash with JVM error on RocksDB. > This happened because KeyDeletingService was trying to access a memory which > is already freed up. > {code:java} > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f19de835af0, pid=1389, tid=1712 > # > # JRE version: OpenJDK Runtime Environment (11.0.6+10) (build 11.0.6+10-LTS) > # Java VM: OpenJDK 64-Bit Server VM (11.0.6+10-LTS, mixed mode, sharing, > tiered, compressed oops, concurrent mark sweep gc, linux-amd64) > # Problematic frame: > # C [librocksdbjni10001996641283911793.so+0x1aeaf0] > Java_org_rocksdb_RocksIterator_seekToFirst0+0x0 > # > # Core dump will be written. Default location: Core dumps may be processed > with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to > /opt/core.1389) > # > # An error report file with more information is saved as: > # /opt/hs_err_pid1389.log > {code} > From the hs_error log file: > {code:java} > --------------- T H R E A D ---------------Current thread > (0x00000000011a4000): JavaThread "KeyDeletingService#1" daemon > [_thread_in_native, id=1712, > stack(0x00007f19d2443000,0x00007f19d2544000)]Stack: > [0x00007f19d2443000,0x00007f19d2544000], sp=0x00007f19d2541e78, free > space=1019k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > C [librocksdbjni10001996641283911793.so+0x1aeaf0] > Java_org_rocksdb_RocksIterator_seekToFirst0+0x0 > j org.rocksdb.AbstractRocksIterator.seekToFirst()V+26 > j > org.apache.hadoop.hdds.utils.db.RDBStoreIterator.<init>(Lorg/rocksdb/RocksIterator;)V+13 > j > org.apache.hadoop.hdds.utils.db.RDBTable.iterator()Lorg/apache/hadoop/hdds/utils/db/TableIterator;+30 > j > org.apache.hadoop.hdds.utils.db.TypedTable.iterator()Lorg/apache/hadoop/hdds/utils/db/TableIterator;+4 > j > org.apache.hadoop.ozone.om.OmMetadataManagerImpl.getPendingDeletionKeys(I)Ljava/util/List;+8 > j > org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(I)Ljava/util/List;+5 > j > org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call()Lorg/apache/hadoop/hdds/utils/BackgroundTaskResult;+39 > j > org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call()Ljava/lang/Object;+1 > J 4791 c1 java.util.concurrent.FutureTask.run()V java.base@11.0.6 (123 bytes) > @ 0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4] > J 4802 c1 > java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; > java.base@11.0.6 (14 bytes) @ 0x00007f19f0c87214 > [0x00007f19f0c870e0+0x0000000000000134] > J 4791 c1 java.util.concurrent.FutureTask.run()V java.base@11.0.6 (123 bytes) > @ 0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4] > J 4802 c1 > java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; > java.base@11.0.6 (14 bytes) @ 0x00007f19f0c87214 > [0x00007f19f0c870e0+0x0000000000000134] > J 4791 c1 java.util.concurrent.FutureTask.run()V java.base@11.0.6 (123 bytes) > @ 0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4] > J 4954 c1 > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V > java.base@11.0.6 (57 bytes) @ 0x00007f19f0cfe10c > [0x00007f19f0cfde40+0x00000000000002cc] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org