[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-05-01 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-216047890 @srowen my JIRA username is "flysjy", thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-28 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-215632760 @srowen The PR with unit testing passed after rebasing master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-26 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-214970336 Yes, I am working it. Will finish tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-22 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-213528482 @srowen , I agree with you. That is a good idea to skip the word2vec iteration step, and directly initialize the `Word2VecModel` class. Will go with this approach

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-21 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-213232923 That is a good idea about the unit test. I actually first included the unit test codes of @MLnick on March 22 with Lee corpus from Gensim, but later did not include them

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-18 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211750648 Thanks. Have updated the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-30 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-203516285 Looks like some the pySpark unit tests expect to have ++---+ |word| similarity

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-26 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-201971518 @MLnick This bug has been fixed without changing existing interfaces. Have tested it with your test script with Lee corpus from Gensim. I am not sure whether you

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-19 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-198852800 Thanks. I have checked that the problem still exists with only the adaptive learning rate change. So, I will fix this bug without change the existing interface

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-19 Thread flyjy
GitHub user flyjy opened a pull request: https://github.com/apache/spark/pull/11812 [SPARK-13289][MLLIB] Fix infinite distances between word vectors in Word2VecModel ## What changes were proposed in this pull request? This PR fixes the bug that generates infinite distances

[GitHub] spark pull request: [SPARK-9763][SQL] Minimize exposure of interna...

2016-02-13 Thread flyjy
Github user flyjy commented on a diff in the pull request: https://github.com/apache/spark/pull/8056#discussion_r52831393 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala --- @@ -0,0 +1,204 @@ +/* +* Licensed to the

[GitHub] spark pull request: [SPARK-13074][Core] Add JavaSparkContext. getP...

2016-02-08 Thread flyjy
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/10978#issuecomment-181688265 @srowen Thank you very much for your suggestions. The way you suggested is clearer but the previous PR is relatively simpler. The existing codes have passed the test

[GitHub] spark pull request: [SPARK-13074][Core] Add JavaSparkContext. getP...

2016-01-28 Thread flyjy
GitHub user flyjy opened a pull request: https://github.com/apache/spark/pull/10978 [SPARK-13074][Core] Add JavaSparkContext. getPersistentRDDs method The "getPersistentRDDs()" is a useful API of SparkContext to get cached RDDs. However, the JavaSparkContext does not hav