Prabhu, Thank you so much for this. It’ll likely be a while before they can make the change to test it, but that process was restarted prior to the new configuration being set, which means it is a good likelihood that’s the issue. I’ll reply back to this thread or escalate to Cloudera directly if that doesn’t solve the issue.
From: Prabhu Josephraj <pjos...@cloudera.com> Sent: Friday, November 22, 2019 12:13 AM To: David M <mcginni...@outlook.com> Cc: user@hadoop.apache.org Subject: Re: Can't Change Retention Period for YARN Log Aggregation The deletion service runs as part of MapReduce JobHistoryServer. Can you try restarting it? On Fri, Nov 22, 2019 at 3:42 AM David M <mcginni...@outlook.com<mailto:mcginni...@outlook.com>> wrote: All, I have an HDP 2.6.1 cluster where we’ve had yarn.log-aggregation.retain-seconds set to 30 days for a while, and everything was working properly. Four days ago we changed the property to 15 days instead and restarted the services. The check interval is set to the default, so we expected within 1.5 days, we’d see the logs older than 15 days deleted. For some reason, we are still seeing 30 days of logs kept. The other properties all seem to be set properly. The only weird setting I can find is that we are using the LogAggregationIndexedFileController as our primary file controller class. The LogAggregationTFileController is still available as the second in the list. I found YARN-8279 (https://issues.apache.org/jira/browse/YARN-8279<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FYARN-8279&data=02%7C01%7C%7C556058e6c7404794890d08d76f13116d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637099999996569986&sdata=wlHtsvJn%2BQAvcscC5v7xvHwCLc4UHQG%2BqRrX7Th4x2o%3D&reserved=0>), which seems sort of related, except that we are still seeing logs being put into the right suffix folder, and it still seems to be deleting logs older than 30 days. It just doesn’t seem to have updated to 15 days as the cutoff instead. I’ve looked in the logs for the Resource Manager, Timeline Server, and one of the Name Nodes, and nothing that would explain this has popped up. Any ideas where to go to figure out what is happening? Additionally, can someone confirm in which process the deletion service actually runs? Is it the resource manager, timeline server, or something else? Thanks! David McGinnis