[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856070#comment-13856070 ] Hadoop QA commented on YARN-1493: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620290/YARN-1493.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 19 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2719//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2719//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2719//console This message is automatically generated. > Schedulers don't recognize apps separately from app-attempts > > > Key: YARN-1493 > URL: https://issues.apache.org/jira/browse/YARN-1493 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, > YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch > > > Today, scheduler is tied to attempt only. > We need to separate app-level handling logic in scheduler. We can add new > app-level events to the scheduler and separate the app-level logic out. This > is good for work-preserving AM restart, RM restart, and also needed for > differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1493: -- Attachment: YARN-1493.6.patch > Schedulers don't recognize apps separately from app-attempts > > > Key: YARN-1493 > URL: https://issues.apache.org/jira/browse/YARN-1493 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, > YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch > > > Today, scheduler is tied to attempt only. > We need to separate app-level handling logic in scheduler. We can add new > app-level events to the scheduler and separate the app-level logic out. This > is good for work-preserving AM restart, RM restart, and also needed for > differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1529: Issue Type: Improvement (was: Bug) > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856042#comment-13856042 ] Gera Shegalov commented on YARN-1529: - Hi [~hitesh] thanks for chiming in! > Does the cache ratio account for the local resource visibility i.e. public > cache misses are more important than cache misses for application visibility? The current patch does not differentiate between cache visibilities. I am open to suggestions whether a finer breakdown for cache misses can be helpful. The goal of this and a follow-up MAPREDUCE is to raise awareness at the aggregate leve that shipping computation to data is not free > I assume the "LocalizationDownloadNanos" is an average per container? How > does an average help when there are numerous application types with diff no. > of resources and each container facing a different cache hit ratio? Is this > something which needs to be augmented into the container status and not a > general NM metric? LocalizationDownloadNanos is a total sum of container launch delay due to localization. An average can be obtained as {code}LocalizationDownloadNanos / ContainersLaunched{code}. > For that matter, what is the better option - trackinglocalization metrics on > the NM level or tracking them on a per container/per app level? I am preparing a patch that exposes this information MR counters for MRv2. Is there a better way to achieve this in an application-agnostic manner such that it is visible in the webUI. > Shouldn't there be a metric that tracks the actual size of the local resource > cache on disk? This is a very good idea in my opinion. > What about different resource types - file/archive/pattern? Currently all resource types are lumped together. We can have a discussion whether it's helpful to expose a finer break down at the NM level or the app-level. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-321: - Assignee: (was: Vinod Kumar Vavilapalli) Not the only one working on it, marking it Unassigned. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856021#comment-13856021 ] Vinod Kumar Vavilapalli commented on YARN-321: -- Tx Zhijie for answering most of Sandy's questions, you are spot on. I'll update the design doc to clarify things where it isn't clear. bq. What is the jira for app specific history data? I just filed YARN-1530, will post more information soon. bq. Could you describe the security requirements a bit further. Its not clear to everyone how everything works currently. To be clear, what exactly needs to be done to make apps write and read history data. The data covered in this JIRA is generic and only RM gets to write it. The consumers of this data are *both* the cluster admins for historical analyses as well as individual apps that chose to not use features that come out of YARN-1530. As such, we cannot let apps write history data. bq. How is the shared bus different from writing to a file. I would think one would cover the other. Yes, writing to a file is one example of a shared-bus. I'll fix it if the doc is confusing w.r.t this. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856023#comment-13856023 ] Hitesh Shah commented on YARN-1529: --- [~jira.shegalov] Could you add more details on how users should interpret these new metrics? Does the cache ratio account for the local resource visibility i.e. public cache misses are more important than cache misses for application visibility? I assume the "LocalizationDownloadNanos" is an average per container? How does an average help when there are numerous application types with diff no. of resources and each container facing a different cache hit ratio? Is this something which needs to be augmented into the container status and not a general NM metric? For that matter, what is the better option - trackinglocalization metrics on the NM level or tracking them on a per container/per app level? Further thoughts: - Shouldn't there be a metric that tracks the actual size of the local resource cache on disk? - How are public/private/application caches being considered? - What about different resource types - file/archive/pattern? > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
[ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856016#comment-13856016 ] Vinod Kumar Vavilapalli commented on YARN-1530: --- Working on a design doc that explains requirements and the solution space. I hope to push it out soon.. > [Umbrella] Store, manage and serve per-framework application-timeline data > -- > > Key: YARN-1530 > URL: https://issues.apache.org/jira/browse/YARN-1530 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli > > This is a sibling JIRA for YARN-321. > Today, each application/framework has to do store, and serve per-framework > data all by itself as YARN doesn't have a common solution. This JIRA attempts > to solve the storage, management and serving of per-framework data from > various applications, both running and finished. The aim is to change YARN to > collect and store data in a generic manner with plugin points for frameworks > to do their own thing w.r.t interpretation and serving. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
Vinod Kumar Vavilapalli created YARN-1530: - Summary: [Umbrella] Store, manage and serve per-framework application-timeline data Key: YARN-1530 URL: https://issues.apache.org/jira/browse/YARN-1530 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli This is a sibling JIRA for YARN-321. Today, each application/framework has to do store, and serve per-framework data all by itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management and serving of per-framework data from various applications, both running and finished. The aim is to change YARN to collect and store data in a generic manner with plugin points for frameworks to do their own thing w.r.t interpretation and serving. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856014#comment-13856014 ] Hadoop QA commented on YARN-1529: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620280/YARN-1529.v01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2718//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2718//console This message is automatically generated. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1521: -- Summary: Mark appropriate protocol methods with the idempotent annotation (was: Mark appropriate methods of ApplicationClientProtocol, ResourceManagerAdminist, ApplicationMasterProtocol and ResourceTracker with the idempotent annotation) > Mark appropriate protocol methods with the idempotent annotation > > > Key: YARN-1521 > URL: https://issues.apache.org/jira/browse/YARN-1521 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > > After YARN-1028, we add the automatically failover into RMProxy. This JIRA is > to identify whether we need to add idempotent annotation and which methods > can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1529: Attachment: YARN-1529.v01.patch {noformat} $ curl -s http://somehost:8042/jmx?qry="Hadoop:service=NodeManager,name=NodeManagerMetrics"; | python -mjson.tool { "beans": [ { "AllocatedContainers": 0, "AllocatedGB": 0, "AvailableGB": 8, "ContainersCompleted": 1, "ContainersFailed": 0, "ContainersIniting": 0, "ContainersKilled": 1, "ContainersLaunched": 2, "ContainersRunning": 0, "LocalizationDownloadNanos": 1803959000, "LocalizedBytesCached": 1529454, "LocalizedBytesCachedRatio": 49, "LocalizedBytesMissed": 1529546, "LocalizedFilesCached": 2, "LocalizedFilesCachedRatio": 33, "LocalizedFilesMissed": 4, "modelerType": "NodeManagerMetrics", "name": "Hadoop:service=NodeManager,name=NodeManagerMetrics", "tag.Context": "yarn", "tag.Hostname": "somehost" } ] } {noformat} > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1529) Add Localization overhead metrics to NM
Gera Shegalov created YARN-1529: --- Summary: Add Localization overhead metrics to NM Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1527) yarn rmadmin command prints wrong usage info:
[ https://issues.apache.org/jira/browse/YARN-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1527: -- Labels: newbie (was: ) > yarn rmadmin command prints wrong usage info: > - > > Key: YARN-1527 > URL: https://issues.apache.org/jira/browse/YARN-1527 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He > Labels: newbie > > The usage should be: yarn rmadmin, instead of java RMAdmin, and the > -refreshQueues should be in the second line. > {code} Usage: java RMAdmin -refreshQueues >-refreshNodes >-refreshSuperUserGroupsConfiguration >-refreshUserToGroupsMappings >-refreshAdminAcls >-refreshServiceAcl >-getGroups [username] >-help [cmd] >-transitionToActive >-transitionToStandby >-failover [--forcefence] [--forceactive] >-getServiceState >-checkHealth > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1488) Allow containers to delegate resources to another container
[ https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855958#comment-13855958 ] Arun C Murthy commented on YARN-1488: - I was thinking along the lines of a protocol such as: Resource Request -> Resource granted -> AM then does (Launch Container or Delegate Container) via NM which re-works cgroups if required. We have explicitly avoided any container management directly between RM & NM in YARN i.e. launch container or re-size container. Of course, RM can still revoke containers during preemption. Having this done via AM is better for scale and simplicity (all of resource acquisition is limited to AM-NM protocol). > Allow containers to delegate resources to another container > --- > > Key: YARN-1488 > URL: https://issues.apache.org/jira/browse/YARN-1488 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > > We should allow containers to delegate resources to another container. This > would allow external frameworks to share not just YARN's resource-management > capabilities but also it's workload-management capabilities. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855948#comment-13855948 ] Hadoop QA commented on YARN-1029: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620257/yarn-1029-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.TestRMNMSecretKeys {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2717//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2717//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2717//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2717//console This message is automatically generated. > Allow embedding leader election into the RM > --- > > Key: YARN-1029 > URL: https://issues.apache.org/jira/browse/YARN-1029 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, > yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, > yarn-1029-approach.patch > > > It should be possible to embed common ActiveStandyElector into the RM such > that ZooKeeper based leader election and notification is in-built. In > conjunction with a ZK state store, this configuration will be a simple > deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1029: --- Attachment: yarn-1029-2.patch Patch that incorporates suggestions from [~bikassaha]. > Allow embedding leader election into the RM > --- > > Key: YARN-1029 > URL: https://issues.apache.org/jira/browse/YARN-1029 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, > yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, > yarn-1029-approach.patch > > > It should be possible to embed common ActiveStandyElector into the RM such > that ZooKeeper based leader election and notification is in-built. In > conjunction with a ZK state store, this configuration will be a simple > deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1528) ZKRMStateStore and EmbeddedElector should allow setting ZK auth information
Karthik Kambatla created YARN-1528: -- Summary: ZKRMStateStore and EmbeddedElector should allow setting ZK auth information Key: YARN-1528 URL: https://issues.apache.org/jira/browse/YARN-1528 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Priority: Minor ZK store and embedded election allow setting ZK-acls but not auth information -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855800#comment-13855800 ] Karthik Kambatla commented on YARN-1029: bq. There is a separate jira open to add a cluster-id Here, we use cluster-id to make sure the RM to which the bread-crumb corresponds to is in the same cluster. In HDFS, they directly check for the other NN's id, which restricts us to a single standby. For RM HA, there is no reason to limit ourselves to two RMs, even though that is probably going to be default deployment. The actual token-related logic can be handled in the other JIRA. bq. this is probably not enough. we need to notify the rm. Just to be sure, are you suggesting we add a new event and a handler in the RM for that event? I have addressed other comments, and looking at the test failure from the previous patch. Will incorporate any other comments and post a patch at the earliest. > Allow embedding leader election into the RM > --- > > Key: YARN-1029 > URL: https://issues.apache.org/jira/browse/YARN-1029 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, > yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-approach.patch > > > It should be possible to embed common ActiveStandyElector into the RM such > that ZooKeeper based leader election and notification is in-built. In > conjunction with a ZK state store, this configuration will be a simple > deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855707#comment-13855707 ] Robert Joseph Evans commented on YARN-321: -- The way it currently works is based off of group permissions on a directory (this is from memory from a while ago so I could be off on a few things). In HDFS when you create a file the group of the file is the group of the directory the file is a part of, similar to the sticky bit on a directory in Linux. When an MR job completes it will copy it's history log file, along with a few other files, to a drop box like location called intermediate done and atomically rename it from a temp name to the final name. The directory is world writable, but only readable by a special group that the history server is a part of, but general users are not. The history server then wakes up periodically and will scan that directory for new files, when it sees new files it will move them to a final location that is owned by the headless history server user. If a query comes in for a job that the history server is not aware of, it will also scan the intermediate done directory before failing. Reading history data is done through RPC to the history server, or through the web interface, including RESTful APIs. There is no supported way for an app to read history data directly though the file system. I hope this helps. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855513#comment-13855513 ] Junping Du commented on YARN-1506: -- Hi [~bikassaha], thanks for comments. You are right that if we don't go through resource update to RMNode first, overcommitment timeout may not be useful to RMNode. However, I would prefer this timeout to be a per operation behavior rather than cluster configuration as user have flexibility to tell YARN how urgent he want to balloon out resources. Thoughts? > Replace set resource change on RMNode/SchedulerNode directly with event > notification. > - > > Key: YARN-1506 > URL: https://issues.apache.org/jira/browse/YARN-1506 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > > According to Vinod's comments on YARN-312 > (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), > we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)