determine number of reducers before MR plan optimizations are done ------------------------------------------------------------------
Key: PIG-2295 URL: https://issues.apache.org/jira/browse/PIG-2295 Project: Pig Issue Type: Improvement Affects Versions: 0.9.0 Reporter: Thejas M Nair MR plan optimization rules use the requested parallelism (specified by user) in the optimization rules. But if the user has not specified the number of reducers, they are determined based on the input data size. But this final number of reducers is not what the optimization rules see, and as a result the plans are sub optimal in two cases - 1. If user has not specified parallelism, and the parallelism heuristic sets parallelism to 1, the LimitAdjuster ends up introducing an unnecessary extra MR job. 2. If the user has not specfied parallelism and parallelism heuristic sets parallelism to be higher than pig.files.concatenation.threshold, the extra concatenation job in case of FRJoin does not get added. The check is in MRCompiler.visitFRJoin(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira