Hi Ayush,

thanks for your time investigating!

I followed your recommendation and it seems to work (also for some of
our consumer projects), so thanks a lot for your time!

Gruß
Richard


Am Samstag, dem 13.04.2024 um 03:35 +0530 schrieb Ayush Saxena:
> Hi Richard,
> Thanx for sharing the steps to reproduce the issue. I cloned the
> Apache Storm repo and was able to repro the issue. The build was
> indeed failing due to missing classes.
> 
> Spent some time to debug the issue, might not be very right (no
> experience with Storm), There are Two ways to get this going
> 
> First Approach: If we want to use the shaded classes
> 
> 1. I think the artifact to be used for minicluster should be `hadoop-
> client-minicluster`, even spark uses the same [1], the one which you
> are using is `hadoop-minicluster`, which in its own is empty
> ```
> ayushsaxena@ayushsaxena ~ %  jar tf
> /Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop-
> minicluster/3.3.6/hadoop-minicluster-3.3.6.jar  | grep .class
> ayushsaxena@ayushsaxena ~ %
> ```
> 
> It just defines artifacts which are to be used by `hadoop-client-
> minicluster` and this jar has that shading and stuff, using `hadoop-
> minicluster` is like adding the hadoop dependencies into the pom
> transitively, without any shading or so, which tends to conflict with
> `hadoop-client-api` and `hadoop-client-runtime` jars, which uses the
> shaded classes.
> 
> 2. Once you change `hadoop-minicluster` to `hadoop-client-
> minicluster`, still the tests won't pass, the reason being the
> `storm-autocreds` dependency which pulls in the hadoop jars via
> `hbase-client` & `hive-exec`, So, we need to exclude them as well
> 
> 3. I reverted your classpath hack, changed the jar, & excluded the
> dependencies from storm-autocreds & ran the storm-hdfs tests & all
> the tests passed, which were failing initially without any code
> change
> ```
> [INFO] Results:
> [INFO]
> [INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0
> [INFO]
> [INFO] --------------------------------------------------------------
> ----------
> [INFO] BUILD SUCCESS
> [INFO] --------------------------------------------------------------
> ----------
> ```
> 
> 4. Putting the code diff here might make this mail unreadable, so I
> am sharing the link to the commit which fixed Storm for me here [2],
> let me know if it has any access issues, I will put the diff on the
> mail itself in text form.
> 
> Second Approach: If we don't want to use the shaded classes
> 
> 1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses
> shading which tends to conflict with your non shaded `hadoop-
> minicluster`, Rather than using these jars use the `hadoop-client`
> jar
> 
> 2. I removed your hack & changed those two jars with `hadoop-client`
> jar & the storm-hdfs tests passes
> 
> 3. I am sharing the link to the commit in my fork, it is here at [3],
> one advantage is, you don't have to change your existing jar nor you
> would need to add those exclusions in the `storm-cred` dependency.
> 
> ++ Adding common-dev, in case any fellow developers with more
> experience around using the hadoop-client jars can help, if things
> still don't work or Storm needs something more. The downstream
> projects which I have experience with don't use these jars (which
> they should ideally) :-) 
> 
> -Ayush
> 
> 
> [1] https://github.com/apache/spark/blob/master/pom.xml#L1382
> [2]
> https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6
> [3] 
> https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4b
> ab9b3f7ac8
> 
> 
> On Fri, 12 Apr 2024 at 10:41, Richard Zowalla <r...@apache.org>
> wrote:
> > Hi,
> > 
> > thanks for the fast reply. The PR is here [1].
> > 
> > It works, if I exclude the client-api and client-api-runtime from
> > being scanned in surefire, which is a hacky workaround for the
> > actual issue.
> > 
> > The hadoop-commons jar is a transient dependency of the
> > minicluster, which is used for testing.
> > 
> > Debugging the situation shows, that HttpServer2  is in the same
> > package in hadoop-commons as well as in the client-api but with
> > differences in methods / classes used, so depending on the
> > classpath order the wrong class is loaded.
> > 
> > Stacktraces are in the first GH Action run.here: [1]. 
> > 
> > A reproducer would be to check out Storm, go to storm-hdfs and
> > remove the exclusion in [2] and run the tests in that module, which
> > will fail due to a missing jetty server class (as the HTTPServer2
> > class is loaded from client-api instead of minicluster).
> > 
> > Gruß & Thx
> > Richard 
> > 
> > [1] https://github.com/apache/storm/pull/3637
> > [2]
> > https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120
> > 
> > On 2024/04/11 21:29:13 Ayush Saxena wrote:
> > > Hi Richard,
> > > I am not able to decode the issue properly here, It would have
> > > been
> > > better if you shared the PR or the failure trace as well.
> > > QQ: Why are you having hadoop-common as an explicit dependency?
> > > Those
> > > hadoop-common stuff should be there in hadoop-client-api
> > > I quickly checked once on the 3.4.0 release and I think it does
> > > have them.
> > > 
> > > ```
> > > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-
> > > 3.4.0.jar |
> > > grep org/apache/hadoop/fs/FileSystem.class
> > > org/apache/hadoop/fs/FileSystem.class
> > > ``
> > > 
> > > You didn't mention which shaded classes are being reported as
> > > missing... I think spark uses these client jars, you can use that
> > > as
> > > an example, can grab pointers from here: [1] & [2]
> > > 
> > > -Ayush
> > > 
> > > [1] https://github.com/apache/spark/blob/master/pom.xml#L1361
> > > [2] https://issues.apache.org/jira/browse/SPARK-33212
> > > 
> > > On Thu, 11 Apr 2024 at 17:09, Richard Zowalla <r...@apache.org>
> > > wrote:
> > > > 
> > > > Hi all,
> > > > 
> > > > we are using "hadoop-minicluster" in Apache Storm to test our
> > > > hdfs
> > > > integration.
> > > > 
> > > > Recently, we were cleaning up our dependencies and I noticed,
> > > > that if I
> > > > am adding
> > > > 
> > > >          <dependency>
> > > >              <groupId>org.apache.hadoop</groupId>
> > > >              <artifactId>hadoop-client-api</artifactId>
> > > >              <version>${hadoop.version}</version>
> > > >          </dependency>
> > > >          <dependency>
> > > >              <groupId>org.apache.hadoop</groupId>
> > > >              <artifactId>hadoop-client-runtime</artifactId>
> > > >              <version>${hadoop.version}</version>
> > > >          </dependency>
> > > > 
> > > > and have
> > > >          <dependency>
> > > >              <groupId>org.apache.hadoop</groupId>
> > > >              <artifactId>hadoop-minicluster</artifactId>
> > > >              <version>${hadoop.version}</version>
> > > >              <scope>test</scope>
> > > >          </dependency>
> > > > 
> > > > as a test dependency to setup a mini-cluster to test our storm-
> > > > hdfs
> > > > integration.
> > > > 
> > > > This fails weirdly because of missing (shaded) classes as well
> > > > as a
> > > > class ambiquity with HttpServer2.
> > > > 
> > > > It is present as a class inside of the "hadoop-client-api" and
> > > > within
> > > > "hadoop-common".
> > > > 
> > > > Is this setup wrong or should we try something different here?
> > > > 
> > > > Gruß
> > > > Richard
> > > 
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: user-h...@hadoop.apache.org
> > > 
> > > 
> > 
> > -------------------------------------------------------------------
> > --
> > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: user-h...@hadoop.apache.org
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to