Hello, I am working on Flink and Spark majoring in Computer Science in Berlin.
I have the important question. Well, this question is from what I do these days, which is translations Hive Query to Flink. When applying [Distribute By] on Hive to the framework, the function should be partitionByHash on Flink. This is to spread out all the rows distributed by a hash key from Object Class in Java. That's why after running the query many times, the results could be different. But you know Hash function is to spread out data fairly and evenly. Here is a question. What about [Distribute By] on Hive? How does this function spread out data fairly? This functions is actually same as this case, which is the results could be different after running query many times? Thanks, Philip -- ========================================================== *Hae Joon Lee* Now, in Germany, M.S. Candidate, Interested in Distributed System, Iterative Processing Dept. of Computer Science, Informatik in German, TUB Technical University of Berlin In Korea, M.S. Candidate, Computer Architecture Laboratory Dept. of Computer Science, KAIST Rm# 4414 CS Dept. KAIST 373-1 Guseong-dong, Yuseong-gu, Daejon, South Korea (305-701) Mobile) 49) 015-251-448-278 in Germany, no cellular in Korea ==========================================================