[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs

Zhijie Shen (JIRA) Tue, 12 Nov 2013 10:18:42 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820297#comment-13820297
 ]


Zhijie Shen commented on YARN-1390:
-----------------------------------

bq. Just to clarify, are you proposing a new field for Application that would 
be a key-value map and would be used to store tags, applicationLineage, etc?
bq. In the long term, yes

IMHO, tags are going to be a list instead of a key-value map. It doesn't make 
sense to have the key. If we define the keys, it will always exist the case 
that user cannot find suitable key to be associated with their words. If user 
define the keys, the keys will be anything as well (in an infinite domain), 
such that there's no difference between the keys and the values. Moreover, I'm 
afraid it doesn't make sense to let user note down a tag and also come up the 
aspect of it.

It seems we have already gone far beyond solving the problem here. The 
immediate solution to the problem seems to be adding another field, 
"applicationLineage" (maybe workflow?), while we must have "applicationType", 
and it should be the computation framework.

In the long term, it is feasible to integrate applicationType and 
applicationLineage when tags are available, and to be processed uniformly. 
setApplicationType and setApplicationLineage can be considered as the express 
way to add the special tags with "ApplicationType:" and "ApplicationLineage:" 
prefix respectively.

bq.  Further, it would be nice to index the apps by these tags, so we don't 
have to iterate through all the applications and filter everytime we query the 
RM.

Agree. Not only for tags and the potential new fields of an application, but 
also for the existing fields. I've suggested the same thing in YARN-1001. It is 
obviously not efficient to iterate over all the applications in RMContext to 
find the desired applications. We may need the index mechanism. I also reopened 
YARN-925 for the sake of pushing the filters into the implementation of AHS 
store, which should have the best knowledge of how to index and search 
applications. RM by default will hold 10000 applications at most, and this may 
be still acceptable. However, AHS may host 1M finished applications, and it 
will be crazy to iterate over all the applications. Maybe we can resort to 
Lucene for index (in memory or in filesystem). Just think it out aloud.

bq. However, I do agree that enforcing applicationType of a YARN application 
contains exactly one of \{Tez, MAPREDUCE, Storm, Spark\}

I think it's good to have some enum values for the common computation 
frameworks. The benefits are:
1. Indicate what applicationType should be
2. Avoid ambiguous words as much as possible (e.g. "MapReduce", "mapreduce", 
"Map/Reduce", "MR", ...)
However, we should make the field open for users to input the applicationType 
that is not known to us.

Up till now, we've discussed a lot about how to host the information. Maybe 
it's better to focus more on the essential problem. It seems that another issue 
will be unchoking the tunnel to pass the lineage information from Oozie to 
YARN. It should go through MR, right? If other computation framework is used, 
that needs to be updated as well, right?

> Provide a way to capture source of an application to be queried through REST 
> or Java Client APIs
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1390
>                 URL: https://issues.apache.org/jira/browse/YARN-1390
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>
> In addition to other fields like application-type (added in YARN-563), it is 
> useful to have an applicationSource field to track the source of an 
> application. The application source can be useful in (1) fetching only those 
> applications a user is interested in, (2) potentially adding source-specific 
> optimizations in the future. 
> Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
> etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs

Reply via email to