Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

Gian Merlino Mon, 08 Aug 2022 21:13:34 -0700

It's always good to deprecate things for some time prior to removing them,
so we don't need to (nor should we) remove Hadoop 2 support right now. My
vote is that in this upcoming release, we should deprecate it. The main
problem in my eyes is the one Abhishek brought up: the dependency
management situation with Hadoop 2 is really messy, and I'm not sure
there's a good way to handle them given the limited classloader isolation.
This situation becomes tougher to manage with each release, and we haven't
had people volunteering to find and build comprehensive solutions. It is
time to move on.


The concern Samarth raised, that people may end up stuck on older Druid
versions because they aren't able to upgrade to Hadoop 3, is valid. I can
see two good solutions to this. First: we can improve native ingest to the
point where people feel broadly comfortable moving Hadoop 2 workloads to
native. The work planned as part of doing ingest via multi-stage
distributed query <https://github.com/apache/druid/issues/12262> is going
to be useful here, by improving the speed and scalability of native ingest.
Second: it would also be great to have something similar that runs on
Spark, for people that have made investments in Spark. I suspect that most
people that used Hadoop 2 have moved on to Hadoop 3 or Spark, so supporting
both of those would ease a lot of the potential pain of dropping Hadoop 2
support.

On Spark: I'm not familiar with the current state of the Spark work. Is it
stuck? If so could something be done to unstick it? I agree with Abhishek
that I wouldn't want to block moving off Hadoop 2 on this. However, it'd be
great if we could get it done before actually removing Hadoop 2 support
from the code base.


On Wed, Aug 3, 2022 at 6:17 AM Abhishek Agarwal <abhishek.agar...@imply.io>
wrote:

> I was thinking that moving from Hadoop 2 to Hadoop 3 will be a
> low-resistance path than moving from Hadoop to Spark. even if we get that
> PR merged, it will take good time for spark integration to reach the same
> level of maturity as Hadoop or Native ingestion. BTW I am not making an
> argument against spark integration. it will certainly be nice to have Spark
> as an option. Just that spark integration doesn't become a blocker for us
> to get off Hadoop.
>
> btw are you using Hadoop 2 right now with the latest druid version? If so,
> did you run into similar errors that I posted in my last email?
>
> On Wed, Jul 27, 2022 at 12:02 AM Samarth Jain <samarth.j...@gmail.com>
> wrote:
>
> > I am sure there are other companies out there who are still on Hadoop 2.x
> > with migration to Hadoop 3.x being a no-go.
> > If Druid was to drop support for Hadoop 3.x completely, I am afraid it
> > would prevent users from updating to newer versions of Druid which would
> be
> > a shame.
> >
> > FWIW, we have found in practice for high volume use cases that compaction
> > based on Druid's Hadoop based batch ingestion is a lot more scale-able
> than
> > the native compaction.
> >
> > Having said that, as an alternative, if we can merge Julian's Spark based
> > ingestion PR <https://github.com/apache/druid/issues/9780>s in Druid,
> that
> > might provide an alternate way for users to get rid of the Hadoop
> > dependency.
> >
> > On Tue, Jul 26, 2022 at 3:19 AM Abhishek Agarwal <
> > abhishek.agar...@imply.io>
> > wrote:
> >
> > > Reviving this conversation again.
> > > @Will - Do you still have concerns about HDFS stability? Hadoop 3 has
> > been
> > > around for some time now and is very stable as far as I know.
> > >
> > > The dependencies coming from Hadoop 2 are also old enough that they
> cause
> > > dependency scans to fail. E.g. Log4j 1.x dependencies that are coming
> > from
> > > Hadoop 2, get flagged during these scans. We have also seen issues when
> > > customers try to use Hadoop ingestion with the latest log4j2 library.
> > >
> > > Exception in thread "main" java.lang.NoSuchMethodError:
> > >
> > >
> >
> org.apache.log4j.helpers.OptionConverter.convertLevel(Ljava/lang/String;Lorg/apache/logging/log4j/Level;)Lorg/apache/logging/log4j/Level;
> > > at
> > >
> > >
> >
> org.apache.log4j.config.PropertiesConfiguration.parseLogger(PropertiesConfiguration.java:393)
> > > at
> > >
> > >
> >
> org.apache.log4j.config.PropertiesConfiguration.configureRoot(PropertiesConfiguration.java:326)
> > > at
> > >
> > >
> >
> org.apache.log4j.config.PropertiesConfiguration.doConfigure(PropertiesConfiguration.java:303)
> > >
> > >
> > > Instead of fixing these point issues, we would be better served by
> > > completely moving to Hadoop 3 entirely. Hadoop 3 does get more frequent
> > > releases and dependencies are well isolated.
> > >
> > > On Tue, Oct 12, 2021 at 12:05 PM Karan Kumar <karankumar1...@gmail.com
> >
> > > wrote:
> > >
> > > > Hello
> > > > We can also use maven profiles. We keep hadoop2 support by default
> and
> > > add
> > > > a new maven profile with hadoop3. This will allow the user to choose
> > the
> > > > profile which is best suited for the use case.
> > > > Agreed, it will not help in the Hadoop dependency problems but does
> > > enable
> > > > our users to use druid with multiple flavors.
> > > > Also with hadoop3, as clint mentioned, the dependencies come
> pre-shaded
> > > so
> > > > we significantly reduce our effort in solving the dependency
> problems.
> > > > I have the PR in the last phases where I am able to run the entire
> test
> > > > suit unit + integration tests on both the default ie hadoop2 and the
> > new
> > > > hadoop3 profile.
> > > >
> > > >
> > > >
> > > > On 2021/06/09 11:55:31, Will Lauer <wla...@verizonmedia.com.INVALID>
> > > > wrote:
> > > > > Clint,
> > > > >
> > > > > I fully understand what type of headache dealing with these
> > dependency
> > > > > issues is. We deal with this all the time, and based on
> conversations
> > > > I've
> > > > > had with our internal hadoop development team, they are quite aware
> > of
> > > > them
> > > > > and just as frustrated by them as you are. I'm certainly in favor
> of
> > > > doing
> > > > > something to improve this situation, as long as it doesn't abandon
> a
> > > > large
> > > > > section of the user base, which I think DROPPING hadoop2 would do.
> > > > >
> > > > > I think there are solutions there that can help solve the
> conflicting
> > > > > dependency problem. Refactoring Hadoop support into an independent
> > > > > extension is certainly a start. But I think the dependency problem
> is
> > > > > bigger than that. There are always going to be conflicts between
> > > > > dependencies in the core system and in extensions as the system
> gets
> > > > > bigger. We have one right now internally that prevents us from
> > enabling
> > > > SQL
> > > > > in our instance of Druid due to conflicts between versions of
> > protobuf
> > > > used
> > > > > by Calcite vs one of our critical extensions. Long term, I think
> you
> > > are
> > > > > going to need to carefully think through a ClassLoader based
> strategy
> > > to
> > > > > truly separate the impact of various dependencies.
> > > > >
> > > > > While I'm not seriously suggesting it for Druid, OSGi WOULD solve
> > this
> > > > > problem. It's a system that allows you to explicitly declare what
> > each
> > > > > bundle exposes to the system, and what each bundle consumes from
> the
> > > > > system, allowing multiple conflicting dependencies to co-exist
> > without
> > > > > impacting each other. OSGi is the big hammer approach, but I bet a
> > more
> > > > > appropriate solution would be a simpler custom-ClassLoader based
> > > solution
> > > > > that hid all dependencies in extensions, keeping them from
> impacting
> > > the
> > > > > core, and that only exposed "public" pieces of the core to
> > extensions.
> > > If
> > > > > Druid's core could be extended without impacting the various
> > > extensions,
> > > > > and the extensions' dependencies could be modified without
> impacting
> > > the
> > > > > core, this would go a long way towards solving the problem that you
> > > have
> > > > > described.
> > > > >
> > > > > Will
> > > > >
> > > > > <http://www.verizonmedia.com>
> > > > >
> > > > > Will Lauer
> > > > >
> > > > > Senior Principal Architect, Audience & Advertising Reporting
> > > > > Data Platforms & Systems Engineering
> > > > >
> > > > > M 508 561 6427
> > > > > 1908 S. First St
> > > > > Champaign, IL 61822
> > > > >
> > > > > <http://www.facebook.com/verizonmedia>   <
> > > > http://twitter.com/verizonmedia>
> > > > > <https://www.linkedin.com/company/verizon-media/>
> > > > > <http://www.instagram.com/verizonmedia>
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jun 9, 2021 at 12:47 AM Clint Wylie <cwy...@apache.org>
> > wrote:
> > > > >
> > > > > > @itai, I think pending the outcome of this discussion that it
> makes
> > > > sense
> > > > > > to have a wider community thread to announce any decisions we
> make
> > > > here,
> > > > > > thanks for bringing that up.
> > > > > >
> > > > > > @rajiv, Minio support seems unrelated to this discussion. It
> seems
> > > > like a
> > > > > > reasonable request, but I recommend starting another thread to
> see
> > if
> > > > > > someone is interested in taking up this effort.
> > > > > >
> > > > > > @jihoon I definitely agree that Hadoop should be refactored to be
> > an
> > > > > > extension longer term. I don't think this upgrade would
> necessarily
> > > > > > make doing such a refactor any easier, but not harder either.
> Just
> > > > moving
> > > > > > Hadoop to an extension also unfortunately doesn't really do
> > anything
> > > to
> > > > > > help our dependency problem though, which is the thing that has
> > > > agitated me
> > > > > > enough to start this thread and start looking into solutions.
> > > > > >
> > > > > > @will/@frank I feel like the stranglehold Hadoop has on our
> > > > dependencies
> > > > > > has started to become especially more painful in the last couple
> of
> > > > > > years. Most painful to me is that we are stuck using a version of
> > > > Apache
> > > > > > Calcite from 2019 (six versions behind the latest), because newer
> > > > versions
> > > > > > require a newer version of Guava. This means we cannot get any
> bug
> > > > fixes
> > > > > > and improvements in our SQL parsing layer without doing something
> > > like
> > > > > > packaging a shaded version of it ourselves or solving our Hadoop
> > > > dependency
> > > > > > problem.
> > > > > >
> > > > > > Many other dependencies have also proved problematic with Hadoop
> as
> > > > well in
> > > > > > the past, and since we aren't able to run the Hadoop integration
> > > tests
> > > > in
> > > > > > Travis, there is always the chance that sometimes we don't catch
> > > these
> > > > when
> > > > > > they go in. I imagine now that we have turned on dependabot this
> > > week,
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_pull_11079&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=0LG0RjDQ1wFfBdl9aPg3-4oJPvJJs26aQsK8KSYLp2s&e=
> > > > > > , that we are going to have to
> > > > > > proceed very carefully with it until we are able to resolve this
> > > > dependency
> > > > > > issue.
> > > > > >
> > > > > > Hadoop 3.3.0 is also the first to support running on a Java
> version
> > > > that is
> > > > > > newer than Java 8 per
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_HADOOP_Hadoop-2BJava-2BVersions&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=a5NmwtOWhCNvY4si_OKf0BRh_FTRpmCJHpTaGs8S64A&e=
> > > > > > ,
> > > > > > which is another area we have been working towards - Druid to
> > > > officially
> > > > > > support Java 11+ environments.
> > > > > >
> > > > > > I'm sort of at a loss of what else to do besides one of
> > > > > > - switching to these Hadoop 3 shaded jars and dropping 2.x
> support
> > > > > > - figuring out how to custom package our own Hadoop 2.x
> > > dependendencies
> > > > > > that are shaded similarly to the Hadoop 3 client jars, and only
> > > > supporting
> > > > > > Hadoop with application classpath isolation
> > > (mapreduce.job.classloader
> > > > =
> > > > > > true)
> > > > > > - just dropping support for Hadoop completely
> > > > > >
> > > > > > I would much rather devote all effort into making Druids native
> > batch
> > > > > > ingestion better to encourage people to migrate to that, than
> > > > continuing to
> > > > > > fight with figuring out how to keep supporting Hadoop, so
> upgrading
> > > and
> > > > > > switching to the shaded client jars at least seemed like a
> > reasonable
> > > > > > compromise to dropping it completely. Maybe making custom shaded
> > > Hadoop
> > > > > > dependencies in the spirit of the Hadoop 3 shaded jars isn't as
> > hard
> > > > as I
> > > > > > am imagining, but it does seem like the most amount of work
> between
> > > the
> > > > > > solutions I could think of to potentially resolve this problem.
> > > > > >
> > > > > > Does anyone have any other ideas of how we can isolate our
> > > dependencies
> > > > > > from Hadoop? Solutions like shading Guava,
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_pull_10964&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=RmDhdAX6x_cU5sebIqzFpGXpo3NnYAYqeyEvwnA-pgw&e=
> > > > > > , would let Druid itself use
> > > > > > newer Guava, but that doesn't help conflicts within our
> > dependencies
> > > > which
> > > > > > has always seemed to be the larger problem to me. Moving Hadoop
> > > > support to
> > > > > > an extension doesn't help anything unless we can ensure that we
> can
> > > run
> > > > > > Druid ingestion tasks on Hadoop without having to match all of
> the
> > > > Hadoop
> > > > > > clusters dependencies with some sort of classloader wizardry.
> > > > > >
> > > > > > Maybe we could consider keeping a 0.22.x release line in Druid
> that
> > > > gets
> > > > > > security and minor bug fixes for some period of time to give
> > people a
> > > > > > longer period to migrate off of Hadoop 2.x? I can't speak for the
> > > rest
> > > > of
> > > > > > the committers, but I would personally be more open to
> maintaining
> > > > such a
> > > > > > branch if it meant that moving forward at least we could update
> all
> > > of
> > > > our
> > > > > > dependencies to newer versions, while providing a transition path
> > to
> > > > still
> > > > > > have at least some support until migrating to Hadoop 3 or native
> > > Druid
> > > > > > batch ingestion.
> > > > > >
> > > > > > Any other ideas?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jun 8, 2021 at 7:44 PM frank chen <frankc...@apache.org>
> > > > wrote:
> > > > > >
> > > > > > > Considering Druid takes advantage of lots of external
> components
> > to
> > > > > > work, I
> > > > > > > think we should upgrade Druid in a little bit conservitive way.
> > > > Dropping
> > > > > > > support of hadoop2 is not a good idea.
> > > > > > > The upgrading of the ZooKeeper client in Druid also prevents me
> > > from
> > > > > > > adopting 0.22 for a longer time.
> > > > > > >
> > > > > > > Although users could upgrade these dependencies first to use
> the
> > > > latest
> > > > > > > Druid releases, frankly speaking, these upgrades are not so
> easy
> > in
> > > > > > > production and usually take longer time, which would prevent
> > users
> > > > from
> > > > > > > experiencing new features of Druid.
> > > > > > > For hadoop3, I have heard of some performance issues, which
> also
> > > > makes me
> > > > > > > have no confidence to upgrade.
> > > > > > >
> > > > > > > I think what Jihoon proposes is a good idea, separating hadoop2
> > > from
> > > > > > Druid
> > > > > > > core as an extension.
> > > > > > > Since hadoop2 has not been EOF, to achieve balance between
> > > > compatibility
> > > > > > > and long term evolution, maybe we could provide two extensions,
> > one
> > > > for
> > > > > > > hadoop2, one for hadoop3.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Will Lauer <wla...@verizonmedia.com.invalid> 于2021年6月9日周三
> > > 上午4:13写道：
> > > > > > >
> > > > > > > > Just to follow up on this, our main problem with hadoop3
> right
> > > now
> > > > has
> > > > > > > been
> > > > > > > > instability in HDFS, to the extent that we put on hold any
> > plans
> > > to
> > > > > > > deploy
> > > > > > > > it to our production systems. I would claim Hadoop3 isn't
> > mature
> > > > enough
> > > > > > > yet
> > > > > > > > to consider migrating Druid to it.
> > > > > > > >
> > > > > > > > WIll
> > > > > > > >
> > > > > > > > <http://www.verizonmedia.com>
> > > > > > > >
> > > > > > > > Will Lauer
> > > > > > > >
> > > > > > > > Senior Principal Architect, Audience & Advertising Reporting
> > > > > > > > Data Platforms & Systems Engineering
> > > > > > > >
> > > > > > > > M 508 561 6427
> > > > > > > > 1908 S. First St
> > > > > > > > Champaign, IL 61822
> > > > > > > >
> > > > > > > > <
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=FZ4dYSh4h5dDUO8gMu1WnMJYULsDN4hZPNJUqDythiU&e=
> > > > > > >   <
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=W_tqzh_jnVhXD_NXIsB8s-f7F_ZO1QCYPv3U1OyNJfs&e=
> > > > > > >
> > > > > > > > <
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=U6DtsEa4Fr2uBu39uaxBIK_th685qDrjPaO3kXZZ0d8&e=
> > > > > > >
> > > > > > > > <
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=gneN2k-ykLUBzoWtYZNsSZ9Bxki7XEvx2tliibfAXys&e=
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jun 8, 2021 at 2:59 PM Will Lauer <
> > > wla...@verizonmedia.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Unfortunately, the migration off of hadoop3 is a hard one
> > > (maybe
> > > > not
> > > > > > > for
> > > > > > > > > Druid, but certainly for big organizations running large
> > > hadoop2
> > > > > > > > > workloads). If druid migrated to hadoop3 after 0.22, that
> > would
> > > > > > > probably
> > > > > > > > > prevent me from taking any new versions of Druid for at
> least
> > > the
> > > > > > > > remainder
> > > > > > > > > of the year and possibly longer.
> > > > > > > > >
> > > > > > > > > Will
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > <http://www.verizonmedia.com>
> > > > > > > > >
> > > > > > > > > Will Lauer
> > > > > > > > >
> > > > > > > > > Senior Principal Architect, Audience & Advertising
> Reporting
> > > > > > > > > Data Platforms & Systems Engineering
> > > > > > > > >
> > > > > > > > > M 508 561 6427
> > > > > > > > > 1908 S. First St
> > > > > > > > > Champaign, IL 61822
> > > > > > > > >
> > > > > > > > > <
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=FZ4dYSh4h5dDUO8gMu1WnMJYULsDN4hZPNJUqDythiU&e=
> > > > > > >   <
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=W_tqzh_jnVhXD_NXIsB8s-f7F_ZO1QCYPv3U1OyNJfs&e=
> > > > > > >
> > > > > > > > >    <
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=U6DtsEa4Fr2uBu39uaxBIK_th685qDrjPaO3kXZZ0d8&e=
> > > > > > >
> > > > > > > > > <
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=6ZP1rygSgHS9fZ6sNwI10fe7Zr9_IIAxDoe_TVLHPjc&s=gneN2k-ykLUBzoWtYZNsSZ9Bxki7XEvx2tliibfAXys&e=
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Jun 8, 2021 at 3:08 AM Clint Wylie <
> > cwy...@apache.org>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi all,
> > > > > > > > >>
> > > > > > > > >> I've been assisting with some experiments to see how we
> > might
> > > > want
> > > > > > to
> > > > > > > > >> migrate Druid to support Hadoop 3.x, and more importantly,
> > see
> > > > if
> > > > > > > maybe
> > > > > > > > we
> > > > > > > > >> can finally be free of some of the dependency issues it
> has
> > > been
> > > > > > > causing
> > > > > > > > >> for as long as I can remember working with Druid.
> > > > > > > > >>
> > > > > > > > >> Hadoop 3 introduced shaded client jars,
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D11804&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=FRw8adGvb_qAPLtFgQWNJywJiOgU8zgfkkXf_nokPKQ&s=rBnEOMf2IKDMeWUo4TZyqf5CzrnbiYTfZUkjHr8GOHo&e=
> > > > > > > > >> , with the purpose to
> > > > > > > > >> allow applications to talk to the Hadoop cluster without
> > > > drowning in
> > > > > > > its
> > > > > > > > >> transitive dependencies. The experimental branch that I
> have
> > > > been
> > > > > > > > helping
> > > > > > > > >> with, which is using these new shaded client jars, can be
> > seen
> > > > in
> > > > > > this
> > > > > > > > PR
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_pull_11314&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=FRw8adGvb_qAPLtFgQWNJywJiOgU8zgfkkXf_nokPKQ&s=424doHggbejAz5XswosgVkJK98VUBcUj0pD5bAcBjT0&e=
> > > > > > > > >> , and is currently working with
> > > > > > > > >> the HDFS integration tests as well as the Hadoop tutorial
> > flow
> > > > in
> > > > > > the
> > > > > > > > >> Druid
> > > > > > > > >> docs (which is pretty much equivalent to the HDFS
> > integration
> > > > test).
> > > > > > > > >>
> > > > > > > > >> The cloud deep storages still need some further testing
> and
> > > some
> > > > > > minor
> > > > > > > > >> cleanup still needs done for the docs and such.
> Additionally
> > > we
> > > > > > still
> > > > > > > > need
> > > > > > > > >> to figure out how to handle the Kerberos extension,
> because
> > it
> > > > > > extends
> > > > > > > > >> some
> > > > > > > > >> Hadoop classes so isn't able to use the shaded client jars
> > in
> > > a
> > > > > > > > >> straight-forward manner, and so still has heavy
> dependencies
> > > and
> > > > > > > hasn't
> > > > > > > > >> been tested. However, the experiment has started to pan
> out
> > > > enough
> > > > > > to
> > > > > > > > >> where
> > > > > > > > >> I think it is worth starting this discussion, because it
> > does
> > > > have
> > > > > > > some
> > > > > > > > >> implications.
> > > > > > > > >>
> > > > > > > > >> Making this change I think will allow us to update our
> > > > dependencies
> > > > > > > > with a
> > > > > > > > >> lot more freedom (I'm looking at you, Guava), but the
> catch
> > is
> > > > that
> > > > > > > once
> > > > > > > > >> we
> > > > > > > > >> make this change and start updating these dependencies, it
> > > will
> > > > > > become
> > > > > > > > >> hard, nearing impossible to support Hadoop 2.x, since as
> far
> > > as
> > > > I
> > > > > > know
> > > > > > > > >> there isn't an equivalent set of shaded client jars. I am
> > also
> > > > not
> > > > > > > > certain
> > > > > > > > >> how far back the Hadoop job classpath isolation stuff goes
> > > > > > > > >> (mapreduce.job.classloader = true) which I think is
> required
> > > to
> > > > be
> > > > > > set
> > > > > > > > on
> > > > > > > > >> Druid tasks for this shaded stuff to work alongside
> updated
> > > > Druid
> > > > > > > > >> dependencies.
> > > > > > > > >>
> > > > > > > > >> Is anyone opposed to or worried about dropping Hadoop 2.x
> > > > support
> > > > > > > after
> > > > > > > > >> the
> > > > > > > > >> Druid 0.22 release?
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > >
> > > >
> > >
> >
>

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

Reply via email to