Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Paul Rogers
Gian mentioned MSQ. The new MSQ work is exciting and powerful for Druid ingestion. If the data needs cleaning, we would expect users to employ something like Spark to do that task, then emit clean data to Kafka or files, which Druid MSQ can ingest. That is: Dirty data —> Spark —> Kafka/Files —>

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Maytas Monsereenusorn
Hi Julian, Thank you so much for your contribution on Spark support. As an existing committer, I would like to help get the Spark connector merged into OSS (including PR reviews and any other development work that may be needed). We can move the conversation regarding Spark support into a new thre