[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212969679 Yes, it's reproducible as mentioned in the third comment at https://issues.apache.org/jira/browse/SPARK-13289 I thought this PR will solve the issue.

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-212915342 My observation (of the current implementation of word2vec) is that the distances between synonyms are getting larger and larger with more iterations and finally to

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-31 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-204120379 How about keep the learning rate related code unchanged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-29 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-202892564 Is this caused by the changes made on word2vec.scala after this PR was initialed? Maybe the change developed a conflict to this PR. (This is just my naive guess. I

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-27 Thread PhoenixDai
Github user PhoenixDai commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-202224483 I tested this commit on the "One Billion Words Language Modeling" dataset with 72 partitions and 15 iterations. It works well. --- If your project is