[jira] [Updated] (YARN-5008) LeveldbRMStateStore database can grow substantially leading to long recovery times

2017-01-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-5008:
-
Fix Version/s: 2.8.0

> LeveldbRMStateStore database can grow substantially leading to long recovery 
> times
> --
>
> Key: YARN-5008
> URL: https://issues.apache.org/jira/browse/YARN-5008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.8.0, 2.7.3, 3.0.0-alpha1
>
> Attachments: YARN-5008.001.patch
>
>
> On large clusters with high application churn the background compaction in 
> leveldb may not be able to keep up with the write rate.  This can lead to 
> large leveldb databases that take many minutes to recover despite not having 
> very much real data in the database to load.  Most the time is spent 
> traversing tables full of keys that have been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5008) LeveldbRMStateStore database can grow substantially leading to long recovery times

2016-06-09 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5008:
-
Fix Version/s: (was: 2.7.4)
   2.7.3

> LeveldbRMStateStore database can grow substantially leading to long recovery 
> times
> --
>
> Key: YARN-5008
> URL: https://issues.apache.org/jira/browse/YARN-5008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.3
>
> Attachments: YARN-5008.001.patch
>
>
> On large clusters with high application churn the background compaction in 
> leveldb may not be able to keep up with the write rate.  This can lead to 
> large leveldb databases that take many minutes to recover despite not having 
> very much real data in the database to load.  Most the time is spent 
> traversing tables full of keys that have been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5008) LeveldbRMStateStore database can grow substantially leading to long recovery times

2016-06-09 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5008:
-
Affects Version/s: (was: 2.7.0)
   2.7.3

Changing Fix Version to 2.7.3 since branch 2.7.3 has not yet been created.

> LeveldbRMStateStore database can grow substantially leading to long recovery 
> times
> --
>
> Key: YARN-5008
> URL: https://issues.apache.org/jira/browse/YARN-5008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.4
>
> Attachments: YARN-5008.001.patch
>
>
> On large clusters with high application churn the background compaction in 
> leveldb may not be able to keep up with the write rate.  This can lead to 
> large leveldb databases that take many minutes to recover despite not having 
> very much real data in the database to load.  Most the time is spent 
> traversing tables full of keys that have been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5008) LeveldbRMStateStore database can grow substantially leading to long recovery times

2016-04-28 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-5008:
-
Attachment: YARN-5008.001.patch

I noticed that in the cases where the database was quite large a manual 
compaction of the database would shrink it from many gigabytes to under 100MB.  
It looks like we need periodic manual compactions of the database to keep the 
leveldb tables from getting filled up with stale keys.  Once the database fills 
with mostly stale keys the recovery process becomes quite slow due to all the 
I/O required to iterate the few valid keys remaining.

Attaching a patch that adds a periodic full compaction of the database.  By 
default it runs every hour, but the interval can be configured or even disabled 
if desired.  I did some tests on a very large database writing keys every 
10msec while a full compaction cycle was running, and the impact to the write 
performance was acceptable.  Writes were occasionally delayed by up to 30% due 
to disk I/O contention, but overall the write performance was still quite good. 
 If the database is already mostly compact the cycle runs very fast, so this 
should have minimal impact on the overall RM state store performance.

> LeveldbRMStateStore database can grow substantially leading to long recovery 
> times
> --
>
> Key: YARN-5008
> URL: https://issues.apache.org/jira/browse/YARN-5008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-5008.001.patch
>
>
> On large clusters with high application churn the background compaction in 
> leveldb may not be able to keep up with the write rate.  This can lead to 
> large leveldb databases that take many minutes to recover despite not having 
> very much real data in the database to load.  Most the time is spent 
> traversing tables full of keys that have been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)