Ruslan Dautkhanov created MAPREDUCE-7219: --------------------------------------------
Summary: Random mappers start delay to have a slow processing ramp-up Key: MAPREDUCE-7219 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7219 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ruslan Dautkhanov Would be great to have a way to configure a random mappers start delay to have a slow/graceful ramp-up of processing and avoid bloating an external system during initialization storm when mappers at their startup have to talk to an external (non as scalable system) - a backend database, ZK, DNS etc.. >From answer to SO question [https://stackoverflow.com/a/56621673/470583] // quote You could limit number of initializations at the same time manually using Apache Curator's org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 mechanism for example See for example how Cloudera uses this in batch-load jobs to load data to Solr - [https://github.com/cloudera/search/blob/cdh6.2.0/search-crunch/src/main/java/org/apache/solr/crunch/MorphlineInitRateLimiter.java#L115] in that particular example they use it to limit number of ZooKeeper initializations that can be at the same time, to avoid bloating ZooKeeper with a storm of requests from hundreds of mappers. In one job I use 400 mappers, but only limit number of initializations to to 30 at the same time (once the initializations are doen, mappers run fully independent). In your example you want to limit number of requests to Oracle backend from mappers, in this example they want to limit number of requests to ZK. So it's the same problem. Ideally it would be great if Hadoop had a way to put a random delay for mappers ramp-up for exact same reason. // quote Instead of using org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 a much more generic solution would be to have a way to have a way to enforce random mappers delay start (with configurable upper limit, and if it's not specified, there will be no limit). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org