Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/18034#discussion_r124916809
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala ---
@@ -468,7 +469,16 @@ object LocalLDAModel extends Loader[LocalLDAModel
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/18265
Cool, LGTM @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/18265
Thanks @jkbradley! I'm really curious about how this came to your
attention. Did somebody actually encounter this bug? For this bug to come up,
the model being trained would have to be truly
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/18265#discussion_r121277008
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
@@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/18265#discussion_r121277000
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
@@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/18265#discussion_r121276885
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -355,9 +364,12 @@ object Word2VecModel extends MLReadable[Word2VecModel
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/17673
Thanks for the detailed response @shubhamchopra.
I'd like to clarify my point about whether this should be implemented in
Spark: Spark MlLib is first and foremost a framework for doing ML
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113605788
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113605682
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113605031
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113605000
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/17673
@shubhamchopra have you run this code in a distributed spark cluster yet?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113603661
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113603493
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113603142
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113602908
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113602640
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113602559
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113602341
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113602046
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113601993
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113601799
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113601656
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113601513
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113600716
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113600548
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113600388
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113600264
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113599808
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113599456
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -194,6 +232,285 @@ final class Word2Vec @Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113599047
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -127,6 +150,11 @@ final class Word2Vec @Since("1.4.0") (
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113598930
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -102,6 +107,24 @@ private[feature] trait Word2VecBase extends Params
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r113598788
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -102,6 +107,24 @@ private[feature] trait Word2VecBase extends Params
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/17673
I'm happy to take a look! I'll have some time to dig in deeper tomorrow.
Some of my initial impressions:
* There's a lot going on here, I agree with @hhbyyh that it would be
cleaner to put
GitHub user Krimit opened a pull request:
https://github.com/apache/spark/pull/17263
[SPARK-19922][ML] small speedups to findSynonyms
Currently generating synonyms using a large model (I've tested with 3m
words) is very slow. These efficiencies have sped things up for us by ~17
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/16811
Updated. I do kind of wish we had access to ``assertJ``, which would make
unordered assertions a cakewalk
---
If your project is set up for it, you can reply to this email and have your
reply
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/16811#discussion_r104402251
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
@@ -134,13 +134,20 @@ class Word2VecSuite extends SparkFunSuite
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/16811
Thanks for your comments @jkbradley, updated. I also took the opportunity
to replace the kinda-janky fuzzyEquals in the test with the ``TestingUtils``
implementation
---
If your project is set up
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/16811
@jkbradley - updated
Added an explicit new test as requested, although the existing test already
covers it (by virtue of the existing methods calling the new methods)
---
If your project
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/16811#discussion_r99612768
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -232,19 +232,40 @@ class Word2VecModel private[ml] (
@Since("
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/16811
cc @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user Krimit opened a pull request:
https://github.com/apache/spark/pull/16811
[SPARK-17629][ML] methods to return synonyms directly
## What changes were proposed in this pull request?
provide methods to return synonyms directly, without wrapping them in a
dataframe
Github user Krimit commented on the issue:
https://github.com/apache/spark/pull/16607
@srowen @jkbradley updated with comments. I used the spark version to sniff
the version as suggested by @jkbradley, although I'm happy to continue the
conversation about the best way to handle
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/16607#discussion_r96672243
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -320,14 +341,29 @@ object Word2VecModel extends
MLReadable[Word2VecModel
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/16607#discussion_r96328450
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -320,14 +341,29 @@ object Word2VecModel extends
MLReadable[Word2VecModel
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/16607#discussion_r96327575
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -302,16 +303,36 @@ class Word2VecModel private[ml] (
@Since("
Github user Krimit commented on a diff in the pull request:
https://github.com/apache/spark/pull/16607#discussion_r96327379
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -302,16 +303,36 @@ class Word2VecModel private[ml] (
@Since("
GitHub user Krimit opened a pull request:
https://github.com/apache/spark/pull/16607
[SPARK-19247][ML] Save large models
## What changes were proposed in this pull request?
* save word2vec models as distributed files rather than as one large datum.
Backwards compatibility
48 matches
Mail list logo