Hi, On Mon, Aug 27, 2012 at 12:09 AM, Koert Kuipers <[email protected]> wrote: > Harsh, > > I see the problem as follows: Usually we want to have people log what they > want, as long as they don't threaten the stability of the system. > > However every once in a while somebody will submit a job that is overly > verbose and will generate many gigabytes of logs in minutes. This is > typically a honest mistake, and the person doesn't realize what is going on > (why is my job so slow?). Limiting the general logging levels for everyone > to deal with these mistakes seems ineffective. Telling the person to change > the logging level for his job will not work either since he/she doesn't > realize what is going on and certainly didn't know in advance.
I had meant to say you could enforce the logging level on the child tasks via finalized job options, but yeah that'd be way too restrictive.. > So all i really want is a very high and hard limit on the log size per job, > to protect the system. Say many hundreds of megabytes or even gigabytes. But > when this limit is reached i want to logging to stop from that point on, or > even the job to be killed. mapred.userlog.limit.kb seems the wrong tool for > the job. Hundreds of MB of logs seems too much for a single task to emit. I believe a good limit is < 10 MB. But yeah, makes sense that one could want more for different forms of jobs and purposes. For such a requirement, I agree the limit.kb isn't the right solution. Perhaps just the retain hours value then. > Before the logging got moved to the mapred.local.dir i had a limit simply by > limiting the size of the partition that logging went to. > > Anyhow, looks like i will have to wait for MAPRED-1100 I agree. > Have a good day! Koert > > On Sun, Aug 26, 2012 at 2:21 PM, Harsh J <[email protected]> wrote: >> >> Yes that is true, it does maintain N events in memory and then flushes >> them down to disk upon closure. With a reasonable size (2 MB of logs >> say) I don't see that causing any memory fill-up issues at all, since >> it does cap (and discard at tail). >> >> The other alternative may be to switch down the log level on the task, >> via mapred.map.child.log.level and/or mapred.reduce.child.log.level >> set to WARN or ERROR. >> >> On Sun, Aug 26, 2012 at 11:37 PM, Koert Kuipers <[email protected]> wrote: >> > Looks like mapred.userlog.limit.kb is implemented by keeping some list >> > in >> > memory, and the logs are not writting to disk until the job finishes or >> > is >> > killed. That doesn't sound acceptable to me. >> > >> > Well i am not the only one with this problem. See MAPREDUCE-1100 >> > >> > >> > On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <[email protected]> wrote: >> >> >> >> Hi Koert, >> >> >> >> On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <[email protected]> >> >> wrote: >> >> > Hey Harsh, >> >> > Thanks for responding! >> >> > Would limiting the logging for each task via mapred.userlog.limit.kb >> >> > be >> >> > strictly enforced (while the job is running)? That would solve my >> >> > issue >> >> > of >> >> > runaway logging on a job filling up the datanode disks. I would set >> >> > the >> >> > limit high since in general i do want to retain logs, just not in >> >> > case a >> >> > single rogue job starts producing many gigabytes of logs. >> >> > Thanks! >> >> >> >> It is not strictly enforced such as counter limits are. Exceeding it >> >> wouldn't fail the task, only cause the extra logged events to not >> >> appear at all (thereby limiting the size). >> >> >> >> > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[email protected]> wrote: >> >> >> >> >> >> Hi Koert, >> >> >> >> >> >> To answer on point, there is no turning off this feature. >> >> >> >> >> >> Since you don't seem to care much for logs from tasks persisting, >> >> >> perhaps consider lowering the mapred.userlog.retain.hours to a lower >> >> >> value than 24 hours (such as 1h)? Or you may even limit the logging >> >> >> from each task to a certain amount of KB via >> >> >> mapred.userlog.limit.kb, >> >> >> which is unlimited by default. >> >> >> >> >> >> Would either of these work for you? >> >> >> >> >> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[email protected]> >> >> >> wrote: >> >> >> > We have smaller nodes (4 to 6 disks), and we used to write logs to >> >> >> > the >> >> >> > same >> >> >> > disk as where the OS is. So if that disks goes then i don't really >> >> >> > care >> >> >> > about tasktrackers failing. Also, the fact that logs were written >> >> >> > to >> >> >> > a >> >> >> > single partition meant that i could make sure they would not grow >> >> >> > too >> >> >> > large >> >> >> > in case someone had too verbose logging on a large job. With >> >> >> > MAPREDUCE-2415 >> >> >> > a job that does massive amount of logging can fill up all the >> >> >> > mapred.local.dir, which in our case are on the same partition as >> >> >> > the >> >> >> > hdfs >> >> >> > data dirs, so now faulty logging can fill up hdfs storage, which i >> >> >> > really >> >> >> > don't like. Any ideas? >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Harsh J >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Harsh J >> > >> > >> >> >> >> -- >> Harsh J > > -- Harsh J
