Hello, I am working on Flink and Spark majoring in Computer Science in
Berlin.

I have the important question.
Well, this question is from what I do these days, which is translations
Hive Query to Flink.

When applying [Distribute By] on Hive to the framework, the function should
be partitionByHash on Flink. This is to spread out all the rows distributed
by a hash key from Object Class in Java. That's why after running the query
many times, the results could be different. But you know Hash function is
to spread out data fairly and evenly.

Here is a question.
What about [Distribute By] on Hive? How does this function spread out data
fairly? This functions is actually same as this case, which is the results
could be different after running query many times?

Thanks,
Philip




-- 

==========================================================

*Hae Joon Lee*


Now, in Germany,

M.S. Candidate, Interested in Distributed System, Iterative Processing

Dept. of Computer Science, Informatik in German, TUB

Technical University of Berlin


In Korea,

M.S. Candidate, Computer Architecture Laboratory

Dept. of Computer Science, KAIST


Rm# 4414 CS Dept. KAIST

373-1 Guseong-dong, Yuseong-gu, Daejon, South Korea (305-701)


Mobile) 49) 015-251-448-278 in Germany, no cellular in Korea

==========================================================

Reply via email to