[ https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820297#comment-13820297 ]
Zhijie Shen commented on YARN-1390: ----------------------------------- bq. Just to clarify, are you proposing a new field for Application that would be a key-value map and would be used to store tags, applicationLineage, etc? bq. In the long term, yes IMHO, tags are going to be a list instead of a key-value map. It doesn't make sense to have the key. If we define the keys, it will always exist the case that user cannot find suitable key to be associated with their words. If user define the keys, the keys will be anything as well (in an infinite domain), such that there's no difference between the keys and the values. Moreover, I'm afraid it doesn't make sense to let user note down a tag and also come up the aspect of it. It seems we have already gone far beyond solving the problem here. The immediate solution to the problem seems to be adding another field, "applicationLineage" (maybe workflow?), while we must have "applicationType", and it should be the computation framework. In the long term, it is feasible to integrate applicationType and applicationLineage when tags are available, and to be processed uniformly. setApplicationType and setApplicationLineage can be considered as the express way to add the special tags with "ApplicationType:" and "ApplicationLineage:" prefix respectively. bq. Further, it would be nice to index the apps by these tags, so we don't have to iterate through all the applications and filter everytime we query the RM. Agree. Not only for tags and the potential new fields of an application, but also for the existing fields. I've suggested the same thing in YARN-1001. It is obviously not efficient to iterate over all the applications in RMContext to find the desired applications. We may need the index mechanism. I also reopened YARN-925 for the sake of pushing the filters into the implementation of AHS store, which should have the best knowledge of how to index and search applications. RM by default will hold 10000 applications at most, and this may be still acceptable. However, AHS may host 1M finished applications, and it will be crazy to iterate over all the applications. Maybe we can resort to Lucene for index (in memory or in filesystem). Just think it out aloud. bq. However, I do agree that enforcing applicationType of a YARN application contains exactly one of \{Tez, MAPREDUCE, Storm, Spark\} I think it's good to have some enum values for the common computation frameworks. The benefits are: 1. Indicate what applicationType should be 2. Avoid ambiguous words as much as possible (e.g. "MapReduce", "mapreduce", "Map/Reduce", "MR", ...) However, we should make the field open for users to input the applicationType that is not known to us. Up till now, we've discussed a lot about how to host the information. Maybe it's better to focus more on the essential problem. It seems that another issue will be unchoking the tunnel to pass the lineage information from Oozie to YARN. It should go through MR, right? If other computation framework is used, that needs to be updated as well, right? > Provide a way to capture source of an application to be queried through REST > or Java Client APIs > ------------------------------------------------------------------------------------------------ > > Key: YARN-1390 > URL: https://issues.apache.org/jira/browse/YARN-1390 > Project: Hadoop YARN > Issue Type: Improvement > Components: api > Affects Versions: 2.2.0 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > > In addition to other fields like application-type (added in YARN-563), it is > useful to have an applicationSource field to track the source of an > application. The application source can be useful in (1) fetching only those > applications a user is interested in, (2) potentially adding source-specific > optimizations in the future. > Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop > etc. -- This message was sent by Atlassian JIRA (v6.1#6144)