Re: isSplitable() problem

2012-04-24 Thread Robert Evans
The current code guarantees that they will be received in order. There some patches that are likely to go in soon that would allow for the JVM itself to be reused. In those cases I believe that the mapper class would be recreated, so the only thing you would have to worry about would be static

Re: isSplitable() problem

2012-04-24 Thread Dan Drew
I have chosen to use Jay's suggestion as a quick workaround and am pleased to report that it seems to work well on small test inputs. My question now is, are the mappers guaranteed to receive the file's lines in order? Browsing the source suggests this is so, but I just want to make sure as my un

Re: isSplitable() problem

2012-04-23 Thread Harsh J
Jay, On Mon, Apr 23, 2012 at 6:43 PM, JAX wrote: > Curious : Seems like you could aggregate the results in the mapper as a local > variable or list of strings--- is there a way to know that your mapper has > just read the LAST line of an input split? True. Can be one way to do it (unless aggre

Re: isSplitable() problem

2012-04-23 Thread JAX
Curious : Seems like you could aggregate the results in the mapper as a local variable or list of strings--- is there a way to know that your mapper has just read the LAST line of an input split? I.e if so, then you could implement your entire solution in your mapper without needing a new inpu

Re: isSplitable() problem

2012-04-23 Thread Dan Drew
Thanks for the clarification. On 23 April 2012 12:52, Harsh J wrote: > Dan, > > Split and reading a whole file as a chunk are two slightly different > things. The former controls if your files ought to be split across > mappers (useful if there are multiple blocks of file in HDFS). The > latter

Re: isSplitable() problem

2012-04-23 Thread Harsh J
Dan, Split and reading a whole file as a chunk are two slightly different things. The former controls if your files ought to be split across mappers (useful if there are multiple blocks of file in HDFS). The latter needs to be achieved differently. The TextInputFormat provides by default a LineRe

isSplitable() problem

2012-04-23 Thread Dan Drew
I require each input file to be processed by each mapper as a whole. I subclass c.o.a.h.mapreduce.lib.input.TextInputFormat and override isSplitable() to invariably return false. The job is configured to use this subclass as the input format class via setInputFormatClass(). The job runs without e