I struggled hard to deal with this issue multiple times over a year and
thankfully we finally
decided to use the official version of Hive 2.3.x too (thank you, Yuming,
Alan, and guys)
I think this is already a huge progress that we started to use the
official version of Hive.
I think we should at
Hi, All.
First of all, I want to put this as a policy issue instead of a technical
issue.
Also, this is orthogonal from `hadoop` version discussion.
Apache Spark community kept (not maintained) the forked Apache Hive
1.2.1 because there has been no other options before. As we see at
SPARK-20202,
I also agree with Steve and Felix.
Let's have another thread to discuss Hive issue
because this thread was originally for `hadoop` version.
And, now we can have `hive-2.3` profile for both `hadoop-2.7` and
`hadoop-3.0` versions.
We don't need to mix both.
Bests,
Dongjoon.
On Mon, Nov 18, 201
1000% with Steve, the org.spark-project hive 1.2 will need a solution. It is
old and rather buggy; and It’s been *years*
I think we should decouple hive change from everything else if people are
concerned?
From: Steve Loughran
Sent: Sunday, November 17, 2019 9:
Let me document as below in few days:
1. For Python and Java, write a single comment that starts with JIRA ID and
short description, e.g. (SPARK-X: test blah blah)
2. For R, use JIRA ID as a prefix for its test name.
assuming everybody is happy.
2019년 11월 18일 (월) 오전 11:36, Hyukjin Kwon 님이 작성
Hey Friends,
I recently created a pull request to add an optional support for bucket joins
to V2 Datasources, via a concrete class representing the Spark Style ash
Partitioning. If anyone has some free time Id appreciate a code review. This
also adds a concrete implementation of V2 ClusteredDi