[ 
https://issues.apache.org/jira/browse/YARN-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524746#comment-15524746
 ] 

Sangjin Lee commented on YARN-5667:
-----------------------------------

Those are great questions.

The diamond dependency (where there are more than one version of a given 
library in the dependency graph) happens because the hadoop code uses 
hadoop-common 3.0.0-alpha1 directly for example, and also 2.5.1 via indirect 
dependency via hbase 1.1.3. Due to hadoop's version management, 3.0.0-alpha1 is 
picked. The implication of this is that we build and test hbase code in the 
context of timeline service *diffrent than* the declared hbase's hadoop 
dependencies.

Now if we think about hbase client code and hbase coprocessor code separately, 
we see that the runtime for both pieces of code is different. The code that 
uses hbase client runs on YARN (and therefore hadoop 3.0.0). In that 
environment, we need to ensure the hbase client itself (not our code that uses 
hbase client) works correctly against the trunk version of hadoop.

On the other hand, the hbase coprocessor code runs on hbase. Therefore, it is 
now timeline service coprocessor code that needs to run under hadoop 2.5.1 
(until/unless we upgrade hbase). These both aspects need to be verified if we 
decide to split the code into separate modules. That would be made easier by 
having them in separate modules.

If we have an hbase version that depends on the trunk, these problems would go 
away. And I understand that the hbase folks are making effort to ensure the 
latest hbase version works against the hadoop trunk version. That said, hbase 
officially can depend only on released versions, and there will always be lags.

As for the reason that the coprocessor depends on the hbase-client-related 
code, there is no strong reason that should be the case. It's just the way the 
code evolved. Actually it would be good to refactor the code so that the 
coprocessor code has minimal dependencies. It's worth looking into.

> Move HBase backend code in ATS v2  into its separate module
> -----------------------------------------------------------
>
>                 Key: YARN-5667
>                 URL: https://issues.apache.org/jira/browse/YARN-5667
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>
> The HBase backend code currently lives along with the core ATS v2 code in 
> hadoop-yarn-server-timelineservice module. Because Resource Manager depends 
> on hadoop-yarn-server-timelineservice, an unnecessary dependency of the RM 
> module on HBase modules is introduced (HBase backend is pluggable, so we do 
> not need to directly pull in HBase jars). 
> In our internal effort to try ATS v2 with HBase 2.0 which depends on Hadoop 
> 3, we encountered a circular dependency during our builds between HBase2.0 
> and Hadoop3 artifacts.
> {code}
> hadoop-mapreduce-client-common, hadoop-yarn-client, 
> hadoop-yarn-server-resourcemanager, hadoop-yarn-server-timelineservice, 
> hbase-server, hbase-prefix-tree, hbase-hadoop2-compat, 
> hadoop-mapreduce-client-jobclient, hadoop-mapreduce-client-common]
> {code}
> This jira proposes we move all HBase-backend-related code from 
> hadoop-yarn-server-timelineservice into its own module (possible name is 
> yarn-server-timelineservice-storage) so that core RM modules do not depend on 
> HBase modules any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to