The current code guarantees that they will be received in order. There some
patches that are likely to go in soon that would allow for the JVM itself to be
reused. In those cases I believe that the mapper class would be recreated, so
the only thing you would have to worry about would be static
I have chosen to use Jay's suggestion as a quick workaround and am pleased
to report that it seems to work well on small test inputs.
My question now is, are the mappers guaranteed to receive the file's lines
in order?
Browsing the source suggests this is so, but I just want to make sure as my
un
Jay,
On Mon, Apr 23, 2012 at 6:43 PM, JAX wrote:
> Curious : Seems like you could aggregate the results in the mapper as a local
> variable or list of strings--- is there a way to know that your mapper has
> just read the LAST line of an input split?
True. Can be one way to do it (unless aggre
Curious : Seems like you could aggregate the results in the mapper as a local
variable or list of strings--- is there a way to know that your mapper has just
read the LAST line of an input split?
I.e if so, then you could implement your entire solution in your mapper without
needing a new inpu
Thanks for the clarification.
On 23 April 2012 12:52, Harsh J wrote:
> Dan,
>
> Split and reading a whole file as a chunk are two slightly different
> things. The former controls if your files ought to be split across
> mappers (useful if there are multiple blocks of file in HDFS). The
> latter
Dan,
Split and reading a whole file as a chunk are two slightly different
things. The former controls if your files ought to be split across
mappers (useful if there are multiple blocks of file in HDFS). The
latter needs to be achieved differently.
The TextInputFormat provides by default a LineRe
I require each input file to be processed by each mapper as a whole.
I subclass c.o.a.h.mapreduce.lib.input.TextInputFormat and override
isSplitable() to invariably return false.
The job is configured to use this subclass as the input format class via
setInputFormatClass(). The job runs without e