Inline Best Regards, Sonal Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal> On Fri, Sep 27, 2013 at 10:42 AM, Sai Sai <saigr...@yahoo.in> wrote: > Hi > I have a few questions i am trying to understand: > > 1. Is each input split same as a record, (a rec can be a single line or > multiple lines). > An InputSplit is a chunk of input that is handled by a map task. It will generally contain multiple records. The RecordReader provides the key values to the map task. Check http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputSplit.html > > 2. Is each Task a collection of few computations or attempts. > > For ex: if i have a small file with 5 lines. > By default there will be 1 line on which each map computation is performed. > So totally 5 computations r done on 1 node. > > This means JT will spawn 1 JVM for 1 Tasktracker on a node > and another JVM for map task which will instantiate 5 map objects 1 for > each line. > > i am not sure what you mean by 5 map objects. But yes, the mapper will be invoked 5 times, once for each line. > The MT JVM is called the task which will have 5 attempts for each line. > This means attempt is same as computation. > > Please let me know if anything is incorrect. > Thanks > Sai > >