Q1. Is it right to assume the System.out.println statements are used only in
eclipse environment and
In a multi node cluster environment we need to use counters.
Q2. I am slightly confused as it appears like using System.out.println
statements
we r able to get detailed info at every line of
I do not understand 1 and 2: Counters are used to count things in the MR
framework in a distributed manner and get aggregate at the JobTracker
level; System.out is merely used to write to STDOUT. Why are you
comparing the two?
3: The limit means the total number of counter names accepted from a
Hello,
Q1.
Depends on your need. If you would like an overall statistics, for example,
the number of the malformed records in your datasets,
use counters. If you just want to know what is going on inside a mapper or
reducer, use System.out.println;
since mappers do not know each other, you cannot
While using System.out inside a Mapper or Reducer is fine as an aid to
learning, be careful: accidentally leaving them in (or not moving to
something like log4J) and running the job for real can mean writing
millions of lines of log output on a tasktracker, filling up disks and
making jobs