[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-95910746 cool, will make the changes along with sprak-7045 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If you

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-95852537 @MechCoder Sorry for my late comment! I made some minor comments. It would be good if you can submit a follow-up PR to address those issues. Thanks! --- If your project i

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r29032437 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -429,7 +429,36 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r29032368 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +508,23 @@ class Word2VecModel private[mllib] ( */ de

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r29032366 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +508,23 @@ class Word2VecModel private[mllib] ( */ de

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-95020437 @jkbradley Could you open a jira for the TODO? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your p

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5467 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94975553 LGTM, merging into master. Thanks very much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94973251 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94973222 [Test build #30700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30700/consoleFull) for PR 5467 at commit [`dd0b0b2`](https://gith

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94951310 [Test build #30700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30700/consoleFull) for PR 5467 at commit [`dd0b0b2`](https://githu

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94949893 @jkbradley I've fixed up your comment. It makes sense any way, since now the entire model is iterated across only once. --- If your project is set up for it, you can r

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94948480 [Test build #30689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30689/consoleFull) for PR 5467 at commit [`ffc9240`](https://gith

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94948494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94917585 [Test build #30689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30689/consoleFull) for PR 5467 at commit [`ffc9240`](https://githu

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94917250 @jkbradley fixed, hopefully should be it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28813287 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -429,7 +429,25 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28811099 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -429,7 +429,25 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28805372 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +497,23 @@ class Word2VecModel private[mllib] ( */

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28805366 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -429,7 +429,25 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94891481 @MechCoder I think that's it. Thanks very much for updating & putting up with the re-do. --- If your project is set up for it, you can reply to this email and have yo

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28805370 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -429,7 +429,25 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94759921 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94759896 [Test build #30663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30663/consoleFull) for PR 5467 at commit [`6b74c81`](https://gith

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94729223 [Test build #30663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30663/consoleFull) for PR 5467 at commit [`6b74c81`](https://githu

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-21 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94727805 I've pushed some updates. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-20 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94642777 yes, on it. by the way, it would be great if you could give me some advice on this PR, https://github.com/apache/spark/pull/5455 I'm not sure how to proceed. --- If y

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-20 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94642552 Yes, I'm sorry about that. Please do push back if you think my advice is incorrect. How difficult would it be to check out an earlier version from that point, and the

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-20 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94630413 Alright, we can move those to another PR. > This PR should still be doable, but you would need to store an Array[Float] instead of the Matrix type. You would al

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-20 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94582162 It just occurred to me that we're converting from Float to Double. I'm not sure historically why Word2Vec used Float, but I'm worrying now about switching since it wil

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-20 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94581278 @MechCoder That looks correct, except you'll need to call this() immediately. I'd write helper methods for constructing wordIndex and wordVectors: ``` priva

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-20 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94534838 @jkbradley I have problems in understanding how to write the code for this. I had this design in mind. class Word2VecModel private[mllib] ( word

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-19 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94348677 @MechCoder There are 2 pieces of code which we're talking about: (1) converting the Map to a Matrix and (2) converting the Matrix to a Map. I suppose that either appr

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94233690 On the contrary it would add more code, since I would have to support both cases. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-18 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94225128 @MechCoder You would probably have to do the slicing (until MLlib's BLAS provides more of that functionality). However, I think you could do the slicing using views so

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-18 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94197937 @jkbradley Thinking over it again, I'm not sure if it would offer a great advantage to do so. If you are talking about preventing this slicing (https://github.com/apac

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28632339 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +492,16 @@ class Word2VecModel private[mllib] ( */

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94078344 @MechCoder I agree that supplying a Map is more intuitive. How about we support: * Private constructor: Take Matrix * Public constructor: Take Map --- If your

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93969770 @jkbradley I think I have addressed all your comments except the constructor. How about retaining the present Word2VecModel(Map: [String, Array(Float)]) and co

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93966926 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93966910 [Test build #30477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30477/consoleFull) for PR 5467 at commit [`da1642d`](https://gith

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93951771 [Test build #30477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30477/consoleFull) for PR 5467 at commit [`da1642d`](https://githu

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93942492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93942477 [Test build #30464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30464/consoleFull) for PR 5467 at commit [`64575b0`](https://gith

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93914210 [Test build #30464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30464/consoleFull) for PR 5467 at commit [`64575b0`](https://githu

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28573221 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +492,16 @@ class Word2VecModel private[mllib] ( */

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28550207 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +492,16 @@ class Word2VecModel private[mllib] ( */

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28550201 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -429,7 +429,20 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-16 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93840169 @MechCoder I just realized that Word2Vec already has a Matrix which could just be passed to Word2VecModel's constructor. That might be easier and let you remove t

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28550199 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -429,7 +429,20 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28550205 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +492,16 @@ class Word2VecModel private[mllib] ( */

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28550188 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -431,6 +431,14 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93570757 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93570742 [Test build #30369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30369/consoleFull) for PR 5467 at commit [`3b0d075`](https://gith

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93545778 [Test build #30369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30369/consoleFull) for PR 5467 at commit [`3b0d075`](https://githu

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-15 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93545241 @jkbradley I've pushed some updates. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-15 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28450065 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -431,6 +431,14 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93025072 Those updates might require significant changes, so I'll make another pass after updates. Thanks! --- If your project is set up for it, you can reply to this email an

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28360427 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +487,17 @@ class Word2VecModel private[mllib] ( */

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28360430 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +487,17 @@ class Word2VecModel private[mllib] ( */

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28360424 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +487,17 @@ class Word2VecModel private[mllib] ( */

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28360417 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -431,6 +431,14 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28360421 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -431,6 +431,14 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-92712446 [Test build #30232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30232/consoleFull) for PR 5467 at commit [`17210c3`](https://gith

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-92712486 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-92679663 [Test build #30232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30232/consoleFull) for PR 5467 at commit [`17210c3`](https://githu

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-14 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-92679362 I've addressed your comments. I did not use the blas calls from linalg.blas initially since I thought there might be some overhead due to preprocessing. This sh

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-92618394 [Test build #30217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30217/consoleFull) for PR 5467 at commit [`a7237aa`](https://gith

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-92618410 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-92594382 [Test build #30217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30217/consoleFull) for PR 5467 at commit [`a7237aa`](https://githu

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-13 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-92519784 Yep, that's pretty much what I had in mind, except that I'd recommend: * using MLlib's local Matrix type (and its BLAS call in mllib.linalg.BLAS) * computing and

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-91828064 [Test build #30070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30070/consoleFull) for PR 5467 at commit [`66cf62a`](https://gith

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-91828077 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-91810391 [Test build #30070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30070/consoleFull) for PR 5467 at commit [`66cf62a`](https://githu

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-11 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-91809293 @jkbradley Was this what you had in mind? P.S: I prefer we finish off the other PR before discussion on this. --- If your project is set up for it, you can rep

[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-11 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/5467 [SPARK-6065] [MLlib] Optimize word2vec.findSynonyms using blas calls 1. Use blas calls to find the dot product between two vectors. 2. Prevent re-computing the L2 norm of the given vector for e