Re: InputFormat to regroup splits of underlying InputFormat to control number of map tasks

2013-06-19 Thread Nicolae Marasoiu
Hi, Our intention is to solve this in a generic context, not just file input. Thus the split class should be generic (very similar to CompositeInputSplit from mapred). We also already implement getRecordReader by iterating over record readers created by the decorated input format (this method i

Re: InputFormat to regroup splits of underlying InputFormat to control number of map tasks

2013-06-19 Thread Robert Evans
This sounds similar to MultiFileInputFormat http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/h adoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apach e/hadoop/mapred/MultiFileInputFormat.java?revision=1239482&view=markup It would be nice if you could

InputFormat to regroup splits of underlying InputFormat to control number of map tasks

2013-06-19 Thread Nicolae Marasoiu
Hi, When running map-reduce with many splits it would be nice from a performance perspective to have fewer splits while maintaining data locality, so that the overhead of running a map task (jvm creation, map executor ramp-up e.g. spring context, etc) be less impactful when frequently running m