[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4265:
------------------------
    Attachment: YARN-4265-trunk.004.patch

Thanks [~djp] for the review! I updated my patch according to your comments. 
Some quick comments:

bq. I am a bit confused with logic here: if appLogs is not done yet, but its 
detail logs is empty, do we need to scanForLogs? If not, we should document the 
reason at the least.
Yes, we only update summary logs when the app is running. Updated comments for 
this. 

bq. If we have two groupIds: 114859476_01_1 and 114859476_01_11, the later 
one's log file name can match with previous groupId as well? If so, we may 
consider to match file name with cache id more exactly? The same case with code 
below {{if (log.getFilename().contains(groupId.toString())) }}
Nice catch! What I'm trying to address here is the names with entity group id 
and a sequence number. I've updated related logic here. 

bq. For cleanLogs(Path dirpath), it seems like the execution result of cleanup 
log depends on the order of files/directories returned. Say an app dir include: 
file A, dir B, file A is a fresh one and all files in dir B are older than 
logRetainMillis. If file A get return first, the cleanLogs() do nothing, but if 
dir B get return first, cleanLogs() will clenup dir B. Give 
fs.listStatusIterator(dirpath) could return file A, dir B in randomly order, is 
this randomly behavior expected?
This is not possible because in the first part of cleanLogs(), we're only doing 
a DFS to decide if we need to remove this dir. If any file in the directory is 
new, we will not remove it. The detailed remove logic happens after the DFS 
process. 

bq. Is it a common case for a AppLogs have many summaryLogs (and detail logs)? 
Right now we're not facing this kind of use case. We can certainly optimize 
this logic in future though. 

bq. Can we directly return appDirPath's modification time instead of go through 
all sub directories?
I believe we cannot. We're trying to return the latest time any file within a 
directory has been changed to decide if the app is in UNKNOWN state for long 
enough in parseSummaryLogs. 

> Provide new timeline plugin storage to support fine-grained entity caching
> --------------------------------------------------------------------------
>
>                 Key: YARN-4265
>                 URL: https://issues.apache.org/jira/browse/YARN-4265
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to