Prabhu,

Thank you so much for this. It’ll likely be a while before they can make the 
change to test it, but that process was restarted prior to the new 
configuration being set, which means it is a good likelihood that’s the issue. 
I’ll reply back to this thread or escalate to Cloudera directly if that doesn’t 
solve the issue.

From: Prabhu Josephraj <pjos...@cloudera.com>
Sent: Friday, November 22, 2019 12:13 AM
To: David M <mcginni...@outlook.com>
Cc: user@hadoop.apache.org
Subject: Re: Can't Change Retention Period for YARN Log Aggregation

The deletion service runs as part of MapReduce JobHistoryServer. Can you try 
restarting it?

On Fri, Nov 22, 2019 at 3:42 AM David M 
<mcginni...@outlook.com<mailto:mcginni...@outlook.com>> wrote:
All,

I have an HDP 2.6.1 cluster where we’ve had yarn.log-aggregation.retain-seconds 
set to 30 days for a while, and everything was working properly. Four days ago 
we changed the property to 15 days instead and restarted the services. The 
check interval is set to the default, so we expected within 1.5 days, we’d see 
the logs older than 15 days deleted.

For some reason, we are still seeing 30 days of logs kept. The other properties 
all seem to be set properly. The only weird setting I can find is that we are 
using the LogAggregationIndexedFileController as our primary file controller 
class. The LogAggregationTFileController is still available as the second in 
the list.

I found YARN-8279 
(https://issues.apache.org/jira/browse/YARN-8279<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FYARN-8279&data=02%7C01%7C%7C556058e6c7404794890d08d76f13116d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637099999996569986&sdata=wlHtsvJn%2BQAvcscC5v7xvHwCLc4UHQG%2BqRrX7Th4x2o%3D&reserved=0>),
 which seems sort of related, except that we are still seeing logs being put 
into the right suffix folder, and it still seems to be deleting logs older than 
30 days. It just doesn’t seem to have updated to 15 days as the cutoff instead.

I’ve looked in the logs for the Resource Manager, Timeline Server, and one of 
the Name Nodes, and nothing that would explain this has popped up. Any ideas 
where to go to figure out what is happening? Additionally, can someone confirm 
in which process the deletion service actually runs? Is it the resource 
manager, timeline server, or something else?

Thanks!

David McGinnis

Reply via email to