Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-95910746
cool, will make the changes along with sprak-7045
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If you
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-95852537
@MechCoder Sorry for my late comment! I made some minor comments. It would
be good if you can submit a follow-up PR to address those issues. Thanks!
---
If your project i
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r29032437
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -429,7 +429,36 @@ class Word2Vec extends Serializable with Logging {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r29032368
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +508,23 @@ class Word2VecModel private[mllib] (
*/
de
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r29032366
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +508,23 @@ class Word2VecModel private[mllib] (
*/
de
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-95020437
@jkbradley Could you open a jira for the TODO?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/5467
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94975553
LGTM, merging into master. Thanks very much!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pr
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94973251
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94973222
[Test build #30700 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30700/consoleFull)
for PR 5467 at commit
[`dd0b0b2`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94951310
[Test build #30700 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30700/consoleFull)
for PR 5467 at commit
[`dd0b0b2`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94949893
@jkbradley I've fixed up your comment. It makes sense any way, since now
the entire model is iterated across only once.
---
If your project is set up for it, you can r
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94948480
[Test build #30689 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30689/consoleFull)
for PR 5467 at commit
[`ffc9240`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94948494
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94917585
[Test build #30689 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30689/consoleFull)
for PR 5467 at commit
[`ffc9240`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94917250
@jkbradley fixed, hopefully should be it
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28813287
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -429,7 +429,25 @@ class Word2Vec extends Serializable with Logging {
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28811099
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -429,7 +429,25 @@ class Word2Vec extends Serializable with Logging {
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28805372
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +497,23 @@ class Word2VecModel private[mllib] (
*/
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28805366
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -429,7 +429,25 @@ class Word2Vec extends Serializable with Logging {
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94891481
@MechCoder I think that's it. Thanks very much for updating & putting up
with the re-do.
---
If your project is set up for it, you can reply to this email and have yo
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28805370
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -429,7 +429,25 @@ class Word2Vec extends Serializable with Logging {
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94759921
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94759896
[Test build #30663 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30663/consoleFull)
for PR 5467 at commit
[`6b74c81`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94729223
[Test build #30663 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30663/consoleFull)
for PR 5467 at commit
[`6b74c81`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94727805
I've pushed some updates.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94642777
yes, on it. by the way, it would be great if you could give me some advice
on this PR, https://github.com/apache/spark/pull/5455 I'm not sure how to
proceed.
---
If y
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94642552
Yes, I'm sorry about that. Please do push back if you think my advice is
incorrect. How difficult would it be to check out an earlier version from that
point, and the
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94630413
Alright, we can move those to another PR.
> This PR should still be doable, but you would need to store an
Array[Float] instead of the Matrix type. You would al
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94582162
It just occurred to me that we're converting from Float to Double. I'm not
sure historically why Word2Vec used Float, but I'm worrying now about switching
since it wil
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94581278
@MechCoder That looks correct, except you'll need to call this()
immediately. I'd write helper methods for constructing wordIndex and
wordVectors:
```
priva
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94534838
@jkbradley I have problems in understanding how to write the code for this.
I had this design in mind.
class Word2VecModel private[mllib] (
word
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94348677
@MechCoder There are 2 pieces of code which we're talking about: (1)
converting the Map to a Matrix and (2) converting the Matrix to a Map. I
suppose that either appr
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94233690
On the contrary it would add more code, since I would have to support both
cases.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94225128
@MechCoder You would probably have to do the slicing (until MLlib's BLAS
provides more of that functionality). However, I think you could do the
slicing using views so
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94197937
@jkbradley Thinking over it again, I'm not sure if it would offer a great
advantage to do so. If you are talking about preventing this slicing
(https://github.com/apac
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28632339
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +492,16 @@ class Word2VecModel private[mllib] (
*/
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-94078344
@MechCoder I agree that supplying a Map is more intuitive. How about we
support:
* Private constructor: Take Matrix
* Public constructor: Take Map
---
If your
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93969770
@jkbradley I think I have addressed all your comments except the
constructor.
How about retaining the present Word2VecModel(Map: [String, Array(Float)])
and co
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93966926
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93966910
[Test build #30477 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30477/consoleFull)
for PR 5467 at commit
[`da1642d`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93951771
[Test build #30477 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30477/consoleFull)
for PR 5467 at commit
[`da1642d`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93942492
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93942477
[Test build #30464 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30464/consoleFull)
for PR 5467 at commit
[`64575b0`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93914210
[Test build #30464 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30464/consoleFull)
for PR 5467 at commit
[`64575b0`](https://githu
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28573221
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +492,16 @@ class Word2VecModel private[mllib] (
*/
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28550207
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +492,16 @@ class Word2VecModel private[mllib] (
*/
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28550201
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -429,7 +429,20 @@ class Word2Vec extends Serializable with Logging {
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93840169
@MechCoder
I just realized that Word2Vec already has a Matrix which could just be
passed to Word2VecModel's constructor. That might be easier and let you remove
t
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28550199
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -429,7 +429,20 @@ class Word2Vec extends Serializable with Logging {
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28550205
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +492,16 @@ class Word2VecModel private[mllib] (
*/
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28550188
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -431,6 +431,14 @@ class Word2Vec extends Serializable with Logging {
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93570757
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93570742
[Test build #30369 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30369/consoleFull)
for PR 5467 at commit
[`3b0d075`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93545778
[Test build #30369 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30369/consoleFull)
for PR 5467 at commit
[`3b0d075`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93545241
@jkbradley I've pushed some updates.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project doe
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28450065
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -431,6 +431,14 @@ class Word2Vec extends Serializable with Logging {
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-93025072
Those updates might require significant changes, so I'll make another pass
after updates. Thanks!
---
If your project is set up for it, you can reply to this email an
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28360427
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +487,17 @@ class Word2VecModel private[mllib] (
*/
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28360430
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +487,17 @@ class Word2VecModel private[mllib] (
*/
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28360424
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +487,17 @@ class Word2VecModel private[mllib] (
*/
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28360417
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -431,6 +431,14 @@ class Word2Vec extends Serializable with Logging {
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/5467#discussion_r28360421
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -431,6 +431,14 @@ class Word2Vec extends Serializable with Logging {
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-92712446
[Test build #30232 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30232/consoleFull)
for PR 5467 at commit
[`17210c3`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-92712486
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-92679663
[Test build #30232 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30232/consoleFull)
for PR 5467 at commit
[`17210c3`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-92679362
I've addressed your comments. I did not use the blas calls from linalg.blas
initially since I thought there might be some overhead due to preprocessing.
This sh
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-92618394
[Test build #30217 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30217/consoleFull)
for PR 5467 at commit
[`a7237aa`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-92618410
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-92594382
[Test build #30217 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30217/consoleFull)
for PR 5467 at commit
[`a7237aa`](https://githu
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-92519784
Yep, that's pretty much what I had in mind, except that I'd recommend:
* using MLlib's local Matrix type (and its BLAS call in mllib.linalg.BLAS)
* computing and
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-91828064
[Test build #30070 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30070/consoleFull)
for PR 5467 at commit
[`66cf62a`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-91828077
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-91810391
[Test build #30070 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30070/consoleFull)
for PR 5467 at commit
[`66cf62a`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/5467#issuecomment-91809293
@jkbradley Was this what you had in mind?
P.S: I prefer we finish off the other PR before discussion on this.
---
If your project is set up for it, you can rep
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/5467
[SPARK-6065] [MLlib] Optimize word2vec.findSynonyms using blas calls
1. Use blas calls to find the dot product between two vectors.
2. Prevent re-computing the L2 norm of the given vector for e
76 matches
Mail list logo