[ 
https://issues.apache.org/jira/browse/HDDS-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-3642.
--------------------------------------
    Fix Version/s: 0.6.0
       Resolution: Fixed

> Stop/Pause Background services while replacing OM DB with checkpoint from 
> Leader
> --------------------------------------------------------------------------------
>
>                 Key: HDDS-3642
>                 URL: https://issues.apache.org/jira/browse/HDDS-3642
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: OM HA
>            Reporter: Hanisha Koneru
>            Assignee: Hanisha Koneru
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>
> When a follower OM needs to replace its DB with a checkpoint from Leader (to 
> catch up on the transactions), it should pause or stop services which read/ 
> write to the DB. 
> During OM HA testing, found that OM could crash with JVM error on RocksDB. 
> This happened because KeyDeletingService was trying to access a memory which 
> is already freed up.
> {code:java}
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f19de835af0, pid=1389, tid=1712
> #
> # JRE version: OpenJDK Runtime Environment (11.0.6+10) (build 11.0.6+10-LTS)
> # Java VM: OpenJDK 64-Bit Server VM (11.0.6+10-LTS, mixed mode, sharing, 
> tiered, compressed oops, concurrent mark sweep gc, linux-amd64)
> # Problematic frame:
> # C  [librocksdbjni10001996641283911793.so+0x1aeaf0]  
> Java_org_rocksdb_RocksIterator_seekToFirst0+0x0
> #
> # Core dump will be written. Default location: Core dumps may be processed 
> with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to 
> /opt/core.1389)
> #
> # An error report file with more information is saved as:
> # /opt/hs_err_pid1389.log
> {code}
> From the hs_error log file:
> {code:java}
> ---------------  T H R E A D  ---------------Current thread 
> (0x00000000011a4000):  JavaThread "KeyDeletingService#1" daemon 
> [_thread_in_native, id=1712, 
> stack(0x00007f19d2443000,0x00007f19d2544000)]Stack: 
> [0x00007f19d2443000,0x00007f19d2544000],  sp=0x00007f19d2541e78,  free 
> space=1019k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> C  [librocksdbjni10001996641283911793.so+0x1aeaf0]  
> Java_org_rocksdb_RocksIterator_seekToFirst0+0x0
> j  org.rocksdb.AbstractRocksIterator.seekToFirst()V+26
> j  
> org.apache.hadoop.hdds.utils.db.RDBStoreIterator.<init>(Lorg/rocksdb/RocksIterator;)V+13
> j  
> org.apache.hadoop.hdds.utils.db.RDBTable.iterator()Lorg/apache/hadoop/hdds/utils/db/TableIterator;+30
> j  
> org.apache.hadoop.hdds.utils.db.TypedTable.iterator()Lorg/apache/hadoop/hdds/utils/db/TableIterator;+4
> j  
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.getPendingDeletionKeys(I)Ljava/util/List;+8
> j  
> org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(I)Ljava/util/List;+5
> j  
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call()Lorg/apache/hadoop/hdds/utils/BackgroundTaskResult;+39
> j  
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call()Ljava/lang/Object;+1
> J 4791 c1 java.util.concurrent.FutureTask.run()V java.base@11.0.6 (123 bytes) 
> @ 0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4]
> J 4802 c1 
> java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; 
> java.base@11.0.6 (14 bytes) @ 0x00007f19f0c87214 
> [0x00007f19f0c870e0+0x0000000000000134]
> J 4791 c1 java.util.concurrent.FutureTask.run()V java.base@11.0.6 (123 bytes) 
> @ 0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4]
> J 4802 c1 
> java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; 
> java.base@11.0.6 (14 bytes) @ 0x00007f19f0c87214 
> [0x00007f19f0c870e0+0x0000000000000134]
> J 4791 c1 java.util.concurrent.FutureTask.run()V java.base@11.0.6 (123 bytes) 
> @ 0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4]
> J 4954 c1 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V 
> java.base@11.0.6 (57 bytes) @ 0x00007f19f0cfe10c 
> [0x00007f19f0cfde40+0x00000000000002cc]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to