Ruslan Dautkhanov created MAPREDUCE-7219:
--------------------------------------------

             Summary: Random mappers start delay to have a slow processing 
ramp-up
                 Key: MAPREDUCE-7219
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7219
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
            Reporter: Ruslan Dautkhanov


Would be great to have a way to configure a random mappers start delay to have 
a slow/graceful ramp-up of processing and avoid bloating an external system 
during initialization storm when mappers at their startup have to talk to an 
external (non as scalable system) - a backend database, ZK, DNS etc..

 

>From answer to SO question 

[https://stackoverflow.com/a/56621673/470583]

 

// quote

You could limit number of initializations at the same time manually using 
Apache Curator's 
org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 mechanism 
for example

See for example how Cloudera uses this in batch-load jobs to load data to Solr -

[https://github.com/cloudera/search/blob/cdh6.2.0/search-crunch/src/main/java/org/apache/solr/crunch/MorphlineInitRateLimiter.java#L115]

in that particular example they use it to limit number of ZooKeeper 
initializations that can be at the same time, to avoid bloating ZooKeeper with 
a storm of requests from hundreds of mappers.

In one job I use 400 mappers, but only limit number of initializations to to 30 
at the same time (once the initializations are doen, mappers run fully 
independent).

In your example you want to limit number of requests to Oracle backend from 
mappers, in this example they want to limit number of requests to ZK. So it's 
the same problem.

Ideally it would be great if Hadoop had a way to put a random delay for mappers 
ramp-up for exact same reason. 

// quote

 

Instead of using 
org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 a much more 
generic solution would be to have a way to have a way to enforce random mappers 
delay start (with configurable upper limit, and if it's not specified, there 
will be no limit). 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to