[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-4265: Attachment: YANR-4265-branch-2.8-addendum.patch YARN-4265-branch-2-addendum.patch Updating the addendum patch fixing compilation issue in branch-2/branch-2.8 > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Fix For: 2.8.0 > > Attachments: YANR-4265-branch-2.8-addendum.patch, > YARN-4265-branch-2-addendum.patch, YARN-4265-trunk.001.patch, > YARN-4265-trunk.002.patch, YARN-4265-trunk.003.patch, > YARN-4265-trunk.004.patch, YARN-4265-trunk.005.patch, > YARN-4265-trunk.006.patch, YARN-4265-trunk.007.patch, > YARN-4265-trunk.008.patch, YARN-4265.YARN-4234.001.patch, > YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.008.patch New patch to address the checkstyle hidden field warning. I also tested locally for the maven dependency issue. I cleaned my local m2 repository, rebuilt hadoop, and ran maven dependency. I cannot reproduce the problem. Also, the problem appears to be intermittent in the past Jenkins runs. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, > YARN-4265-trunk.005.patch, YARN-4265-trunk.006.patch, > YARN-4265-trunk.007.patch, YARN-4265-trunk.008.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.007.patch New patch adds the missing package-info.java that the check-style script is mourning about. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, > YARN-4265-trunk.005.patch, YARN-4265-trunk.006.patch, > YARN-4265-trunk.007.patch, YARN-4265.YARN-4234.001.patch, > YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.006.patch Thanks [~djp] for the review! I addressed all of you concerns in the latest patch. Any other commnets? cc/[~xgong], [~jlowe] > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, > YARN-4265-trunk.005.patch, YARN-4265-trunk.006.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.005.patch Thanks [~djp] and [~liuml07] for the help! Since we need to handle files with appends, we cannot directly use directory modification time to decide if the contents of a directory has been changed. This means we also need to change some logics in the cleanLogs method. I redesigned cleanLogs method to perform a log scan with two methods: Method 1: For the given directory, search (in a depth first fashion) to find out application log directories. For each of them, call method 2. Method 2: For the given application log directory, search all files inside. If there exists a file that has been "recently" (as defined by the configs) updates, skip removing this directory. Otherwise, remove this application log directory. In this way we can search inside a directory for all application log directories that need to be reclaimed. According to Junping's suggestion, I've also added a new unit test (testCleanLogs) to cover common cases for the cleanLogs method. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, > YARN-4265-trunk.005.patch, YARN-4265.YARN-4234.001.patch, > YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.004.patch Thanks [~djp] for the review! I updated my patch according to your comments. Some quick comments: bq. I am a bit confused with logic here: if appLogs is not done yet, but its detail logs is empty, do we need to scanForLogs? If not, we should document the reason at the least. Yes, we only update summary logs when the app is running. Updated comments for this. bq. If we have two groupIds: 114859476_01_1 and 114859476_01_11, the later one's log file name can match with previous groupId as well? If so, we may consider to match file name with cache id more exactly? The same case with code below {{if (log.getFilename().contains(groupId.toString())) }} Nice catch! What I'm trying to address here is the names with entity group id and a sequence number. I've updated related logic here. bq. For cleanLogs(Path dirpath), it seems like the execution result of cleanup log depends on the order of files/directories returned. Say an app dir include: file A, dir B, file A is a fresh one and all files in dir B are older than logRetainMillis. If file A get return first, the cleanLogs() do nothing, but if dir B get return first, cleanLogs() will clenup dir B. Give fs.listStatusIterator(dirpath) could return file A, dir B in randomly order, is this randomly behavior expected? This is not possible because in the first part of cleanLogs(), we're only doing a DFS to decide if we need to remove this dir. If any file in the directory is new, we will not remove it. The detailed remove logic happens after the DFS process. bq. Is it a common case for a AppLogs have many summaryLogs (and detail logs)? Right now we're not facing this kind of use case. We can certainly optimize this logic in future though. bq. Can we directly return appDirPath's modification time instead of go through all sub directories? I believe we cannot. We're trying to return the latest time any file within a directory has been changed to decide if the app is in UNKNOWN state for long enough in parseSummaryLogs. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.003.patch Refresh the patch once more to: # Add documentations about cache item and its main features # Double check synchronizations and fix unsynchronized accessors. # Change accessor modifiers to some methods to make them stricter. Ran findbugs locally and it looks fine. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265.YARN-4234.001.patch, > YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.002.patch Thanks [~djp] for the review! In the 002 patch I addressed most of the checkstyle problems, as well as most existing comments. Please feel free to add more. Some comments: bq. I noticed that we are setting 1 minutes as default scan interval but original patch in HDFS-3942 is 5 minutes. Why shall we do any update here? For now I increased the default frequency to scan HDFS and pull timeline data. Having a 5-minute time interval means users are less likely to see any running status for apps that finish within 5 minutes. Right now I'm setting this value to 1 minute to reduce reader react time. bq. The same question on "app-cache-size", the default value in HDFS-3942 is 5 but here is 10. Any reason to update the value? In YARN-3942, caching is performed on application level. In this patch, caching is performed in entity groups. Each application may have a few to tens of entity groups. Normally, there are slightly more active entity groups than active applications in the system. For now, I'm increasing this default value to hold slightly more entity groups in cache. bq. Why we don't have any default value specified in property of "yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes"? Plugins are provided by third-party applications such as Tez. Right now we cannot assume which exact entity group plugin the user is using, therefore we have to conservatively leave this config as empty. bq. For EmptyTimelineEntityGroupPlugin.java, why we need this class? I didn't see any help provided even in tests. We should remove it if useless. Ah, nice catch. Removed it. bq. Can we optimize the synchronization logic here? Like in synchronized method refreshCache, we are intialize/start/stop TimelineDataManager (and MemoryTimelineStore) which is quite expensive and unnecessary to block other synchronized operations. Shall we move these operations out of synchronized block? It's certainly doable. Right now I have yet to optimize this part because it's a little bit tricky to fine tune synchronization performance before we have a relatively stable starting point. Also, we're using fine-grained locking for each cached item in the reader cache, and cache refresh only happens infrequently (~10 secs by default), so maybe we'd like to stabilize the whole synchronization story before fine tune this part? > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: (was: YARN-4265-trunk.poc_001.patch) > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.001.patch Thanks [~djp]! I just rebased my patch to the latest trunk. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, > YARN-4265-trunk.poc_001.patch, YARN-4265.YARN-4234.001.patch, > YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265.YARN-4234.002.patch Update a new patch to fix some log issues and implement the getTimeline API. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.poc_001.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265.YARN-4234.001.patch I refactored my code and added some unit tests. The current patch only depends on YARN-4234. I addressed comments raised by [~jlowe] in the previous round of review, with two pending actions that I think we can address in separate JIRAs: # Build a "fall-back" plugin to have the same behavior of YARN-3942. # Make caching storage pluggable. For the second item, right now my patch does _not_ depend on YARN-4219. So I'm setting the v1.5 plugin storage to use the memory storage system as the "caching" storage, similar to YARN-3942. I'm adding the v1.5 plugin storage (EntityGroupFSTimelineStore) in a module called hadoop-yarn-server-timeline-pluginstorage. This is slightly different to YARN-3942. We need a separate module because the new v1.5 storage depends on yarn-client, but we don't want ats server depends on yarn-client (The v1.5 storage is not a purely server side storage.). I'm naming it as "pluginstorage" because I'm considering to put the leveldb caching storage into this extension as well. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.poc_001.patch, > YARN-4265.YARN-4234.001.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.poc_001.patch This is a proof-of-concept patch for the proposed improvement. This patch is largely based on the work in YARN-3942. In this patch we added a new storage plugin to accommodate the newly added cache id concept. As in YARN-4234, entity logs are stored in /app_id/attampt_id/-cache_id.log. The newly added plugin can with with this new directory structure, and can refresh (reload) entity logs in a cache id granularity (rather than application granularity). > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.poc_001.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)