Harsh,

I see the problem as follows: Usually we want to have people log what they
want, as long as they don't threaten the stability of the system.

However every once in a while somebody will submit a job that is overly
verbose and will generate many gigabytes of logs in minutes. This is
typically a honest mistake, and the person doesn't realize what is going on
(why is my job so slow?). Limiting the general logging levels for everyone
to deal with these mistakes seems ineffective. Telling the person to change
the logging level for his job will not work either since he/she doesn't
realize what is going on and certainly didn't know in advance.

So all i really want is a very high and hard limit on the log size per job,
to protect the system. Say many hundreds of megabytes or even gigabytes.
But when this limit is reached i want to logging to stop from that point
on, or even the job to be killed. mapred.userlog.limit.kb seems the wrong
tool for the job.

Before the logging got moved to the mapred.local.dir i had a limit simply
by limiting the size of the partition that logging went to.

Anyhow, looks like i will have to wait for MAPRED-1100

Have a good day! Koert

On Sun, Aug 26, 2012 at 2:21 PM, Harsh J <ha...@cloudera.com> wrote:

> Yes that is true, it does maintain N events in memory and then flushes
> them down to disk upon closure. With a reasonable size (2 MB of logs
> say) I don't see that causing any memory fill-up issues at all, since
> it does cap (and discard at tail).
>
> The other alternative may be to switch down the log level on the task,
> via mapred.map.child.log.level and/or mapred.reduce.child.log.level
> set to WARN or ERROR.
>
> On Sun, Aug 26, 2012 at 11:37 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > Looks like mapred.userlog.limit.kb is implemented by keeping some list in
> > memory, and the logs are not writting to disk until the job finishes or
> is
> > killed. That doesn't sound acceptable to me.
> >
> > Well i am not the only one with this problem. See MAPREDUCE-1100
> >
> >
> > On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi Koert,
> >>
> >> On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <ko...@tresata.com>
> wrote:
> >> > Hey Harsh,
> >> > Thanks for responding!
> >> > Would limiting the logging for each task via mapred.userlog.limit.kb
> be
> >> > strictly enforced (while the job is running)? That would solve my
> issue
> >> > of
> >> > runaway logging on a job filling up the datanode disks. I would set
> the
> >> > limit high since in general i do want to retain logs, just not in
> case a
> >> > single rogue job starts producing many gigabytes of logs.
> >> > Thanks!
> >>
> >> It is not strictly enforced such as counter limits are. Exceeding it
> >> wouldn't fail the task, only cause the extra logged events to not
> >> appear at all (thereby limiting the size).
> >>
> >> > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <ha...@cloudera.com> wrote:
> >> >>
> >> >> Hi Koert,
> >> >>
> >> >> To answer on point, there is no turning off this feature.
> >> >>
> >> >> Since you don't seem to care much for logs from tasks persisting,
> >> >> perhaps consider lowering the mapred.userlog.retain.hours to a lower
> >> >> value than 24 hours (such as 1h)? Or you may even limit the logging
> >> >> from each task to a certain amount of KB via mapred.userlog.limit.kb,
> >> >> which is unlimited by default.
> >> >>
> >> >> Would either of these work for you?
> >> >>
> >> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <ko...@tresata.com>
> >> >> wrote:
> >> >> > We have smaller nodes (4 to 6 disks), and we used to write logs to
> >> >> > the
> >> >> > same
> >> >> > disk as where the OS is. So if that disks goes then i don't really
> >> >> > care
> >> >> > about tasktrackers failing. Also, the fact that logs were written
> to
> >> >> > a
> >> >> > single partition meant that i could make sure they would not grow
> too
> >> >> > large
> >> >> > in case someone had too verbose logging on a large job. With
> >> >> > MAPREDUCE-2415
> >> >> > a job that does massive amount of logging can fill up all the
> >> >> > mapred.local.dir, which in our case are on the same partition as
> the
> >> >> > hdfs
> >> >> > data dirs, so now faulty logging can fill up hdfs storage, which i
> >> >> > really
> >> >> > don't like. Any ideas?
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Reply via email to