[jira] [Commented] (GEODE-10401) Oplog recovery takes too long due to fault in fastutil library

ASF subversion and git services (Jira) Mon, 26 Jun 2023 04:17:07 -0700


    [ 
https://issues.apache.org/jira/browse/GEODE-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737102#comment-17737102
 ]


ASF subversion and git services commented on GEODE-10401:
---------------------------------------------------------

Commit b828c9ff38a369bc5a4814e0ad2fbbbb0ddf7827 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=b828c9ff38 ]

GEODE-10401: Replace 1.15.0 with 1.15.1 as old version (#7868)

Replace 1.15.0 with 1.15.1 in old versions and set as default Benchmarks 
baseline on develop
to enable rolling upgrade tests from 1.15.1

The serialization version has not changed between 1.15.0 and 1.15.1,
so there should be no need to keep both

> Oplog recovery takes too long due to fault in fastutil library
> --------------------------------------------------------------
>
>                 Key: GEODE-10401
>                 URL: https://issues.apache.org/jira/browse/GEODE-10401
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.1, 1.16.0
>
>
> {color:#0e101a}As we already know, the .drf file delete operations only 
> contain OplogEntryID. During recovery, the server reads (byte by byte) each 
> OplogEntryID and stores it in a HashSet to use later when recovering .crf 
> files. There are two types of HashSets: IntOpenHashSet and LongOpenHashSet. 
> The OplogEntryID of type 
> {color}_{color:#0e101a}integer{color}_{color:#0e101a} will be stored in 
> IntOpenHashSet, and {color}_{color:#0e101a}long 
> integer{color}_{color:#0e101a} in LongOpenHashSet, probably due to memory 
> optimization and performance factors. OplogEntryID starts with a zero and 
> increments throughout time.
> {color}
> {color:#0e101a}We have observed in logs that between exception (There is a 
> large number of deleted entries) and the previous log have passed more than 4 
> minutes (sometimes even more).{color}
> {code:java}
> {"timestamp":"2022-06-14T21:41:43.772+08:00","severity":"info","message":"Recovering
>  oplog#271 /opt/dbservice/data/datastore/BACKUPdataDiskStore_271.drf for disk 
> store dataDiskStore.","metadata":
> {"timestamp":"2022-06-14T21:46:02.152+08:00","severity":"warning","message":"There
>  is a large number of deleted entries within the disk-store, please execute 
> an offline
> compaction.","metadata":
> {code}
> {color:#0e101a}When the above exception occurs, that means that the limit of 
> {color}_{color:#0e101a}805306401{color}_{color:#0e101a} entries in 
> IntOpenHashSet has been reached. In that case, the server rolls to the new 
> IntOpenHashSet, where an exception and the delay could happen again.{color}
> {color:#0e101a}The problem is that due to the fault in FastUtil dependency 
> (IntOpenHashSet and LongOpenHashSet), the unnecessary rehashing happens 
> multiple times before the max size is reached. The{color} 
> _{color:#0e101a}rehashing starts from{color}_ {color:#0e101a}805306368 
> onwards for each new entry until the max size. This rehashing adds several 
> minutes to .drf Oplog recovery, but does nothing as max is already 
> reached.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (GEODE-10401) Oplog recovery takes too long due to fault in fastutil library

Reply via email to