Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-29 Thread Will Xu
If there is a spark ingestion option, would you be open to move away from hadoop or there are other factors that might prevent a move? Regards, Will Product@Imply On Mon, Aug 29, 2022 at 8:15 AM Will Lauer wrote: > @Abhishek, I haven't spoken with our Hadoop team recently about Hadoop3 >

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-29 Thread Will Lauer
@Abhishek, I haven't spoken with our Hadoop team recently about Hadoop3 stability, so I can't say for sure, but I understand the need to migrate and all the dependency headaches involved in NOT migrating. At this point, I expect druid moving to hadoop3 makes sense. I suspect that _we_ won't be

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Paul Rogers
Gian mentioned MSQ. The new MSQ work is exciting and powerful for Druid ingestion. If the data needs cleaning, we would expect users to employ something like Spark to do that task, then emit clean data to Kafka or files, which Druid MSQ can ingest. That is: Dirty data —> Spark —> Kafka/Files

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Maytas Monsereenusorn
Hi Julian, Thank you so much for your contribution on Spark support. As an existing committer, I would like to help get the Spark connector merged into OSS (including PR reviews and any other development work that may be needed). We can move the conversation regarding Spark support into a new

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Julian Jaffe
For Spark support, the connector I wrote remains functional but I haven’t updated the PR for six months or so since it didn’t seem like there was an appetite for review. If that’s changing I could migrate back some more recent changes to the OSS PR. Even with an up-to-date patch though I see

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-09 Thread Abhishek Agarwal
Yes. We should deprecate it first which is similar to dropping the support (no more active development) but we will still ship it for a release or two. In a way, we are already in that mode to a certain extent. Many features are being built with native ingestion as a first-class citizen. E.g.

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-08 Thread Gian Merlino
It's always good to deprecate things for some time prior to removing them, so we don't need to (nor should we) remove Hadoop 2 support right now. My vote is that in this upcoming release, we should deprecate it. The main problem in my eyes is the one Abhishek brought up: the dependency management

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-03 Thread Abhishek Agarwal
I was thinking that moving from Hadoop 2 to Hadoop 3 will be a low-resistance path than moving from Hadoop to Spark. even if we get that PR merged, it will take good time for spark integration to reach the same level of maturity as Hadoop or Native ingestion. BTW I am not making an argument

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-07-26 Thread Samarth Jain
I am sure there are other companies out there who are still on Hadoop 2.x with migration to Hadoop 3.x being a no-go. If Druid was to drop support for Hadoop 3.x completely, I am afraid it would prevent users from updating to newer versions of Druid which would be a shame. FWIW, we have found in

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-07-26 Thread Abhishek Agarwal
Reviving this conversation again. @Will - Do you still have concerns about HDFS stability? Hadoop 3 has been around for some time now and is very stable as far as I know. The dependencies coming from Hadoop 2 are also old enough that they cause dependency scans to fail. E.g. Log4j 1.x

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-10-12 Thread Karan Kumar
Hello We can also use maven profiles. We keep hadoop2 support by default and add a new maven profile with hadoop3. This will allow the user to choose the profile which is best suited for the use case. Agreed, it will not help in the Hadoop dependency problems but does enable our users to use

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-06-09 Thread Will Lauer
Clint, I fully understand what type of headache dealing with these dependency issues is. We deal with this all the time, and based on conversations I've had with our internal hadoop development team, they are quite aware of them and just as frustrated by them as you are. I'm certainly in favor of

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-06-08 Thread Clint Wylie
@itai, I think pending the outcome of this discussion that it makes sense to have a wider community thread to announce any decisions we make here, thanks for bringing that up. @rajiv, Minio support seems unrelated to this discussion. It seems like a reasonable request, but I recommend starting

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-06-08 Thread frank chen
Considering Druid takes advantage of lots of external components to work, I think we should upgrade Druid in a little bit conservitive way. Dropping support of hadoop2 is not a good idea. The upgrading of the ZooKeeper client in Druid also prevents me from adopting 0.22 for a longer time.

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-06-08 Thread Will Lauer
Just to follow up on this, our main problem with hadoop3 right now has been instability in HDFS, to the extent that we put on hold any plans to deploy it to our production systems. I would claim Hadoop3 isn't mature enough yet to consider migrating Druid to it. WIll

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-06-08 Thread Will Lauer
Unfortunately, the migration off of hadoop3 is a hard one (maybe not for Druid, but certainly for big organizations running large hadoop2 workloads). If druid migrated to hadoop3 after 0.22, that would probably prevent me from taking any new versions of Druid for at least the remainder of the year

Re: [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-06-08 Thread Jihoon Son
Clint, thank you for starting this thread. I love the idea of dropping support for Hadoop 2.x. The shaded jars will definitely help us upgrade our rusty dependencies. Another problem with hadoop is that the hadoop ingestion lives in the Druid core today, not in a separate extension. Longer term,

Re: [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-06-08 Thread Rajiv Mordani
Also how about officially supporting minio? I know that support for s3 exists but it will be good to officially support minio as well as the deep storage. * Rajiv From: Clint Wylie Date: Tuesday, June 8, 2021 at 1:08 AM To: dev@druid.apache.org Subject: [DISCUSS] Hadoop 3, dropping

Re: [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-06-08 Thread Itai Yaffe
Hey Clint, I think it's definitely a step in the right direction. One thing I would suggest, since the are several deployments using Hadoop (either for deep storage and/or for ingestion), is to let the wider community know in advance that Hadoop 2.x support is going to be dropped in favor of 3.x

[DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

2021-06-08 Thread Clint Wylie
Hi all, I've been assisting with some experiments to see how we might want to migrate Druid to support Hadoop 3.x, and more importantly, see if maybe we can finally be free of some of the dependency issues it has been causing for as long as I can remember working with Druid. Hadoop 3 introduced