Hadoop figures out the start and end by knowing the record delimiters. You don't have to do that manually.
Sent from my iPhone On May 27, 2011, at 12:55 PM, Mohit Anchlia <[email protected]> wrote: > I am new to hadoop and from what I understand by default hadoop splits > the input into blocks. Now this might result in splitting a line of > record into 2 pieces and getting spread accross 2 maps. For eg: Line > "abcd" might get split into "ab" and "cd". How can one prevent this in > hadoop and pig? I am looking for some examples where I can see how I > can specify my own split so that it logically splits based on the > record delimiter and not the block size. For some reason I am not able > to get right examples online.
