Re: Using own InputSplit

2011-05-29 Thread Michel Segel
You can add that sometimes the input file is too small and you don't get the desired parallelism. Sent from a remote device. Please excuse any typos... Mike Segel On May 27, 2011, at 12:25 PM, Harsh J wrote: > Mohit, > > On Fri, May 27, 2011 at 10:44 PM, Mohit Anchlia > wrote: >> Actually

Re: Using own InputSplit

2011-05-27 Thread Harsh J
Mohit, On Fri, May 27, 2011 at 10:44 PM, Mohit Anchlia wrote: > Actually this link confused me > > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input > > "Clearly, logical splits based on input-size is insufficient for many > applications since record boundaries must be r

Re: Using own InputSplit

2011-05-27 Thread Mohit Anchlia
Actually this link confused me http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input "Clearly, logical splits based on input-size is insufficient for many applications since record boundaries must be respected. In such cases, the application should implement a RecordReader,

Re: Using own InputSplit

2011-05-27 Thread Harsh J
The query fit into mapreduce-user, since it primarily dealt with how Map/Reduce operates over data, just to clarify :) On Fri, May 27, 2011 at 10:38 PM, Mohit Anchlia wrote: > thanks! Just thought it's better to post to multiple groups together > since I didn't know where it belongs :) > > On Fri

Re: Using own InputSplit

2011-05-27 Thread Mohit Anchlia
thanks! Just thought it's better to post to multiple groups together since I didn't know where it belongs :) On Fri, May 27, 2011 at 10:04 AM, Harsh J wrote: > Mohit, > > Please do not cross-post a question to multiple lists unless you're > announcing something. > > What you describe, does not ha

Re: Using own InputSplit

2011-05-27 Thread Harsh J
Mohit, Please do not cross-post a question to multiple lists unless you're announcing something. What you describe, does not happen; and the way the splitting is done for Text files is explained in good detail here: http://wiki.apache.org/hadoop/HadoopMapReduce Hope this solves your doubt :) On