You can add that sometimes the input file is too small and you don't get the
desired parallelism.
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 27, 2011, at 12:25 PM, Harsh J ha...@cloudera.com wrote:
Mohit,
On Fri, May 27, 2011 at 10:44 PM, Mohit Anchlia
I am new to hadoop and from what I understand by default hadoop splits
the input into blocks. Now this might result in splitting a line of
record into 2 pieces and getting spread accross 2 maps. For eg: Line
abcd might get split into ab and cd. How can one prevent this in
hadoop and pig? I am
Mohit,
Please do not cross-post a question to multiple lists unless you're
announcing something.
What you describe, does not happen; and the way the splitting is done
for Text files is explained in good detail here:
http://wiki.apache.org/hadoop/HadoopMapReduce
Hope this solves your doubt :)
thanks! Just thought it's better to post to multiple groups together
since I didn't know where it belongs :)
On Fri, May 27, 2011 at 10:04 AM, Harsh J ha...@cloudera.com wrote:
Mohit,
Please do not cross-post a question to multiple lists unless you're
announcing something.
What you
The query fit into mapreduce-user, since it primarily dealt with how
Map/Reduce operates over data, just to clarify :)
On Fri, May 27, 2011 at 10:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
thanks! Just thought it's better to post to multiple groups together
since I didn't know where it
Actually this link confused me
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input
Clearly, logical splits based on input-size is insufficient for many
applications since record boundaries must be respected. In such cases,
the application should implement a RecordReader,
Mohit,
On Fri, May 27, 2011 at 10:44 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
Actually this link confused me
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input
Clearly, logical splits based on input-size is insufficient for many
applications since record