Sorting the inputSplits

2015-07-22 Thread Nishanth S
Hey folks, Is their a way to sort the input splits in map reduce.We have a case where there are two files file1 and file2 in the input directory.Since we have custominputformat which has issplittable return false always each of these files would be processed by a different mapper.How could

Re: Sorting the inputSplits

2015-07-29 Thread Gera Shegalov
Can you clarify the requirement "processed first"? Maps run in parallel without any ordering guarantees. If you want to affect the mapping file->split number, you can implement your own getSplits in the custom input format and return splits ordered anyway your like. On Wed, Jul 22, 2015 at 12:06 P

Re: Sorting the inputSplits

2015-07-30 Thread Niels Basjes
MapReduce is based on the premise that several parts of a task can be processed independently in parallel. If you "require" an order of processing then these files are depending on each other. Why use MapReduce at all? With your requirement you cannot use more than one CPU anyway. Niels On Thu, 3

Re: Sorting the inputSplits

2015-07-30 Thread Harsh J
If you meant 'scheduled' first perhaps thats doable by following (almost) what Gera says. The framework actually explicitly sorts your InputSplits list by its reported lengths, which would serve as the hack point for inducing a reordering. See https://github.com/apache/hadoop-common/blob/trunk/hado

Re: Sorting the inputSplits

2015-08-18 Thread Nishanth S
Thank you.I have explained the problem better here below.Is this possible?. We have a use case where we have files in the below directory structure. The requirement is that we should not process files inside a Parent directory in parallel(1.txt and 2.txt cannot be processed in parallel sinc

Re: Sorting the inputSplits

2015-08-18 Thread Rudra Tripathy
Hi Nishanth Even if u ordered input split u can't order d output On Aug 19, 2015 1:55 AM, "Nishanth S" wrote: > Thank you.I have explained the problem better here below.Is this > possible?. > > > We have a use case where we have files in the below directory structure. > The requirement is tha