Re: [SPARK ML] Minhash integer overflow

2018-07-06 Thread jiayuanm
Sure. JIRA ticket is here: https://issues.apache.org/jira/browse/SPARK-24754. I'll create the PR. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

[SPARK ML] Minhash integer overflow

2018-07-06 Thread jiayuanm
Hi everyone, I was playing around with LSH/Minhash module from spark ml module. I noticed that hash computation is done with Int (see https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala#L69). Since "a" and "b" are from a uniform