[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-19 Thread Ishiihara
Github user Ishiihara commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-52702675 @mateiz This is taken care of by https://github.com/apache/spark/pull/1932 and is already merged in master and 1.1. In that PR, the model output by each partition is us

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-52702570 We merged #1932 instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have thi

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-19 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-52701541 @Ishiihara why did you close this, has this been fixed elsewhere now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHu

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-18 Thread Ishiihara
Github user Ishiihara closed the pull request at: https://github.com/apache/spark/pull/1871 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is e

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-11 Thread Ishiihara
Github user Ishiihara commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51878228 @mateiz The performance of PrimitiveKeyOpenHashMap is on par with mutable.HashMap. For one partition case, the PrimitiveKeyOpenHashMap is slightly faster than using big

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-11 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51865485 Just wondering, any noticeable perf difference with this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51851620 QA results for PR 1871:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51846871 QA tests have started for PR 1871. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18335/consoleFull --- If

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51730738 Even better might be Spark's PrimitiveKeyOpenHashMap here. Again, if there are lots of keys. --- If your project is set up for it, you can reply to this email and have yo

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51730712 Just FYI, mutable.HashMap can be pretty inefficient in space usage, compared e.g. to java.util.HashMap or to Spark's AppendOnlyMap. In this case it will depend on how many

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread Ishiihara
Github user Ishiihara commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51724432 @mengxr Some benchmark result Environment: OSX 10.9, 8G memory, 2.5G i5 CPU, 4 threads startingAlpha = 0.0025 vecterSize = 100 Driver memory 2g s

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread Ishiihara
Github user Ishiihara commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51720995 @mengxr It is about 1-2 minutes slower with vector size = 100 for different number of partitions. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51720842 @Ishiihara Did you compare the speed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51714618 QA results for PR 1871:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51713377 QA tests have started for PR 1871. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18279/consoleFull --- If

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51712724 QA results for PR 1871:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51712712 QA tests have started for PR 1871. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18277/consoleFull --- If

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51708105 QA results for PR 1871:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51707392 QA tests have started for PR 1871. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18272/consoleFull --- If

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51707054 QA results for PR 1871:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1871#issuecomment-51706424 QA tests have started for PR 1871. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18267/consoleFull --- If

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-09 Thread Ishiihara
GitHub user Ishiihara opened a pull request: https://github.com/apache/spark/pull/1871 [SPARK-2907] [MLlib] Use mutable.HashMap to represent model in Word2Vec Change list: 1. Used mutable.HashMap to represent syn0Global and syn1Global to reduce shuffle size. 2. Introduced lo