Hi Richard, Thanx for sharing the steps to reproduce the issue. I cloned the Apache Storm repo and was able to repro the issue. The build was indeed failing due to missing classes.
Spent some time to debug the issue, might not be very right (no experience with Storm), There are Two ways to get this going *First Approach: If we want to use the shaded classes* 1. I think the artifact to be used for minicluster should be `hadoop-client-minicluster`, even spark uses the same [1], the one which you are using is `hadoop-minicluster`, which in its own is empty ``` ayushsaxena@ayushsaxena ~ % jar tf /Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop-minicluster/3.3.6/hadoop-minicluster-3.3.6.jar | grep .class ayushsaxena@ayushsaxena ~ % ``` It just defines artifacts which are to be used by `hadoop-client-minicluster` and this jar has that shading and stuff, using `hadoop-minicluster` is like adding the hadoop dependencies into the pom transitively, without any shading or so, which tends to conflict with `hadoop-client-api` and `hadoop-client-runtime` jars, which uses the shaded classes. 2. Once you change `hadoop-minicluster` to `hadoop-client-minicluster`, still the tests won't pass, the reason being the `storm-autocreds` dependency which pulls in the hadoop jars via `hbase-client` & `hive-exec`, So, we need to exclude them as well 3. I reverted your classpath hack, changed the jar, & excluded the dependencies from storm-autocreds & ran the storm-hdfs tests & all the tests passed, which were failing initially without any code change ``` [INFO] Results: [INFO] [INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ ``` 4. Putting the code diff here might make this mail unreadable, so I am sharing the link to the commit which fixed Storm for me here [2], let me know if it has any access issues, I will put the diff on the mail itself in text form. *Second Approach: If we don't want to use the shaded classes* 1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses shading which tends to conflict with your non shaded `hadoop-minicluster`, Rather than using these jars use the `hadoop-client` jar 2. I removed your hack & changed those two jars with `hadoop-client` jar & the storm-hdfs tests passes 3. I am sharing the link to the commit in my fork, it is here at [3], one advantage is, you don't have to change your existing jar nor you would need to add those exclusions in the `storm-cred` dependency. ++ Adding common-dev, in case any fellow developers with more experience around using the hadoop-client jars can help, if things still don't work or Storm needs something more. The downstream projects which I have experience with don't use these jars (which they should ideally) :-) -Ayush [1] https://github.com/apache/spark/blob/master/pom.xml#L1382 [2] https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6 [3] https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4bab9b3f7ac8 On Fri, 12 Apr 2024 at 10:41, Richard Zowalla <r...@apache.org> wrote: > Hi, > > thanks for the fast reply. The PR is here [1]. > > It works, if I exclude the client-api and client-api-runtime from being > scanned in surefire, which is a hacky workaround for the actual issue. > > The hadoop-commons jar is a transient dependency of the minicluster, which > is used for testing. > > Debugging the situation shows, that HttpServer2 is in the same package in > hadoop-commons as well as in the client-api but with differences in methods > / classes used, so depending on the classpath order the wrong class is > loaded. > > Stacktraces are in the first GH Action run.here: [1]. > > A reproducer would be to check out Storm, go to storm-hdfs and remove the > exclusion in [2] and run the tests in that module, which will fail due to a > missing jetty server class (as the HTTPServer2 class is loaded from > client-api instead of minicluster). > > Gruß & Thx > Richard > > [1] https://github.com/apache/storm/pull/3637 > [2] > https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120 > > On 2024/04/11 21:29:13 Ayush Saxena wrote: > > Hi Richard, > > I am not able to decode the issue properly here, It would have been > > better if you shared the PR or the failure trace as well. > > QQ: Why are you having hadoop-common as an explicit dependency? Those > > hadoop-common stuff should be there in hadoop-client-api > > I quickly checked once on the 3.4.0 release and I think it does have > them. > > > > ``` > > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar | > > grep org/apache/hadoop/fs/FileSystem.class > > org/apache/hadoop/fs/FileSystem.class > > `` > > > > You didn't mention which shaded classes are being reported as > > missing... I think spark uses these client jars, you can use that as > > an example, can grab pointers from here: [1] & [2] > > > > -Ayush > > > > [1] https://github.com/apache/spark/blob/master/pom.xml#L1361 > > [2] https://issues.apache.org/jira/browse/SPARK-33212 > > > > On Thu, 11 Apr 2024 at 17:09, Richard Zowalla <r...@apache.org> wrote: > > > > > > Hi all, > > > > > > we are using "hadoop-minicluster" in Apache Storm to test our hdfs > > > integration. > > > > > > Recently, we were cleaning up our dependencies and I noticed, that if I > > > am adding > > > > > > <dependency> > > > <groupId>org.apache.hadoop</groupId> > > > <artifactId>hadoop-client-api</artifactId> > > > <version>${hadoop.version}</version> > > > </dependency> > > > <dependency> > > > <groupId>org.apache.hadoop</groupId> > > > <artifactId>hadoop-client-runtime</artifactId> > > > <version>${hadoop.version}</version> > > > </dependency> > > > > > > and have > > > <dependency> > > > <groupId>org.apache.hadoop</groupId> > > > <artifactId>hadoop-minicluster</artifactId> > > > <version>${hadoop.version}</version> > > > <scope>test</scope> > > > </dependency> > > > > > > as a test dependency to setup a mini-cluster to test our storm-hdfs > > > integration. > > > > > > This fails weirdly because of missing (shaded) classes as well as a > > > class ambiquity with HttpServer2. > > > > > > It is present as a class inside of the "hadoop-client-api" and within > > > "hadoop-common". > > > > > > Is this setup wrong or should we try something different here? > > > > > > Gruß > > > Richard > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: user-h...@hadoop.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org > >