Hello, I am working on Flink and Spark majoring in Computer Science in
Berlin.
I have the important question.
Well, this question is from what I do these days, which is translations
Hive Query to Flink.
When applying [Distribute By] on Hive to the framework, the function should
be partitionByHash
> When applying [Distribute By] on Hive to the framework, the function
>should be partitionByHash on Flink. This is to spread out all the rows
>distributed by a hash key from Object Class in Java.
Hive does not use the Object hashCode - the identityHashCode is
inconsistent, so Object.hashCode() .
Thanks for your help.
so do you think if we want the same result from Hive and Spark or the other
freamwork, how could we try this one ?
could you tell me in detail.
Regards,
Philip
On Thu, Oct 22, 2015 at 6:25 PM, Gopal Vijayaraghavan wrote:
>
> > When applying [Distribute By] on Hive to the
> so do you think if we want the same result from Hive and Spark or the
>other freamwork, how could we try this one ?
There's a special backwards compat slow codepath that gets triggered if
you do
set mapred.reduce.tasks=199; (or any number)
This will produce the exact same hash-code as the jav
Hello, the same question about DISTRIBUTE BY on Hive.
Accorring to you, you do not use hashCode of Object class on DBY,
Distribute By.
I tried to understand how ObjectInspectorUtils works for distribution, but
it seemed it has a lot of Hive API. It is not much understnading.
I want to override pa
Hello, the same question about DISTRIBUTE BY on Hive.
Accorring to you, you do not use hashCode of Object class on DBY,
Distribute By.
I tried to understand how ObjectInspectorUtils works for distribution, but
it seemed it has a lot of Hive API. It is not much understnading.
I want to override pa
> I want to override partitionByHash function on Flink like the same way
>of DBY on Hive.
> I am working on implementing some benchmark system for these two system,
>which could be contritbutino to Hive as well.
I would be very disappointed if Flink fails to outperform Hive with a
Distribute BY,