Hi Nishanth Even if u ordered input split u can't order d output On Aug 19, 2015 1:55 AM, "Nishanth S" <chinchu2...@gmail.com> wrote:
> Thank you.I have explained the problem better here below.Is this > possible?. > > > We have a use case where we have files in the below directory structure. > The requirement is that we should not process files inside a Parent > directory in parallel(1.txt and 2.txt cannot be processed in parallel > since we need to do some check pointing we have to process the oldest file > first).How ever 1.txt and 5.txt can be processed in parallel. Right now I > am over riding the list status method to pick only the oldest file but > this means I cannot achieve parallelism outside the parent as well since > the number of input splits is always 1. What would be the way to go about > this use case ?.In short I want to achieve parallelism outside Parent > directory but not within it. Please advise. > > > > published/ > > +-- Parent1/ > > ¦ +-- 1.txt > > ¦ +-- 2.txt > > ¦ +-- 3.txt > > +-- Parent2/ > > +-- 4.txt > > +-- 5.txt > > > > > On Wed, Jul 29, 2015 at 5:31 PM, Gera Shegalov <g...@shegalov.com> wrote: > >> Can you clarify the requirement "processed first"? Maps run in parallel >> without any ordering guarantees. If you want to affect the mapping >> file->split number, you can implement your own getSplits in the custom >> input format and return splits ordered anyway your like. >> >> On Wed, Jul 22, 2015 at 12:06 PM, Nishanth S <chinchu2...@gmail.com> >> wrote: >> >>> Hey folks, >>> >>> Is their a way to sort the input splits in map reduce.We have a case >>> where there are two files file1 and file2 in the input directory.Since we >>> have custominputformat which has issplittable return false always each >>> of these files would be processed by a different mapper.How could I make >>> sure that file1 is processed before file2(I want the oldest file to be >>> processed first).Is this possible?. >>> >>> Thanks, >>> Nishan >>> >> >> >