Hi,

I'm working on migrating Apache Storm to Hadoop 3.2.0, and I'm having some
trouble with the dependency tree pulled in by Hive.

Our direct dependencies on Hive are

org.apache.hive:hcatalog:hive-hcatalog-core:3.1.1
org.apache.hive:hive-webhcat-java-client:3.1.1
org.apache.hive:hive-hcatalog-streaming:3.1.1

Are these artifacts intended for use by other projects, or should I be
using other (shaded?) artifacts to interact with Hive?

The Hadoop manual (
https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/DownstreamDev.html#Build_Artifacts)
lists the artifacts downstream projects should be using. Most of those
artifacts shade Hadoop's dependencies, to avoid causing conflicts in users'
projects. HBase does the same with hbase-shaded-client.

Hive doesn't seem to use these shaded artifacts, but instead refers to
artifacts like hbase-client, or hadoop-hdfs, which causes conflicts with
the shaded artifacts (hbase-shaded-client, hadoop-hdfs-client), since both
shaded and unshaded artifacts contain the same Hadoop classes.

Additionally hive-hcatalog-streaming pulls in hive-cli, which pulls in
Hadoop 2.7.4 jars. This doesn't seem intentional.

Are there any plans to migrate Hive to the shaded Hadoop/HBase jars for
Hive 4, or would there be objections against doing so? I think it could
help avoid dependency conflicts when projects rely on Hive and Hadoop/HBase
at the same time.

Reply via email to