[GitHub] spark pull request #18034: [SPARK-20797][MLLIB]fix LocalLDAModel.save() bug.

2017-06-29 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/18034#discussion_r124916809 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala --- @@ -468,7 +469,16 @@ object LocalLDAModel extends Loader[LocalLDAModel

[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-13 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/18265 Cool, LGTM @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-11 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/18265 Thanks @jkbradley! I'm really curious about how this came to your attention. Did somebody actually encounter this bug? For this bug to come up, the model being trained would have to be truly

[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-11 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/18265#discussion_r121277008 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite

[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-11 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/18265#discussion_r121277000 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite

[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-11 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/18265#discussion_r121276885 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -355,9 +364,12 @@ object Word2VecModel extends MLReadable[Word2VecModel

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-30 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/17673 Thanks for the detailed response @shubhamchopra. I'd like to clarify my point about whether this should be implemented in Spark: Spark MlLib is first and foremost a framework for doing ML

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113605788 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113605682 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113605031 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113605000 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-26 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/17673 @shubhamchopra have you run this code in a distributed spark cluster yet? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113603661 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113603493 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113603142 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113602908 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113602640 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113602559 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113602341 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113602046 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113601993 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113601799 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113601656 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113601513 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113600716 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113600548 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113600388 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113600264 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113599808 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113599456 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -194,6 +232,285 @@ final class Word2Vec @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113599047 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -127,6 +150,11 @@ final class Word2Vec @Since("1.4.0") (

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113598930 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -102,6 +107,24 @@ private[feature] trait Word2VecBase extends Params

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-04-26 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r113598788 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -102,6 +107,24 @@ private[feature] trait Word2VecBase extends Params

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-25 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/17673 I'm happy to take a look! I'll have some time to dig in deeper tomorrow. Some of my initial impressions: * There's a lot going on here, I agree with @hhbyyh that it would be cleaner to put

[GitHub] spark pull request #17263: [SPARK-19922][ML] small speedups to findSynonyms

2017-03-11 Thread Krimit
GitHub user Krimit opened a pull request: https://github.com/apache/spark/pull/17263 [SPARK-19922][ML] small speedups to findSynonyms Currently generating synonyms using a large model (I've tested with 3m words) is very slow. These efficiencies have sped things up for us by ~17

[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

2017-03-06 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/16811 Updated. I do kind of wish we had access to ``assertJ``, which would make unordered assertions a cakewalk --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-03-06 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/16811#discussion_r104402251 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -134,13 +134,20 @@ class Word2VecSuite extends SparkFunSuite

[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

2017-03-05 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/16811 Thanks for your comments @jkbradley, updated. I also took the opportunity to replace the kinda-janky fuzzyEquals in the test with the ``TestingUtils`` implementation --- If your project is set up

[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

2017-02-15 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/16811 @jkbradley - updated Added an explicit new test as requested, although the existing test already covers it (by virtue of the existing methods calling the new methods) --- If your project

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-02-06 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/16811#discussion_r99612768 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -232,19 +232,40 @@ class Word2VecModel private[ml] ( @Since("

[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

2017-02-05 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/16811 cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-02-05 Thread Krimit
GitHub user Krimit opened a pull request: https://github.com/apache/spark/pull/16811 [SPARK-17629][ML] methods to return synonyms directly ## What changes were proposed in this pull request? provide methods to return synonyms directly, without wrapping them in a dataframe

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-01-19 Thread Krimit
Github user Krimit commented on the issue: https://github.com/apache/spark/pull/16607 @srowen @jkbradley updated with comments. I used the spark version to sniff the version as suggested by @jkbradley, although I'm happy to continue the conversation about the best way to handle

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-01-18 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/16607#discussion_r96672243 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -320,14 +341,29 @@ object Word2VecModel extends MLReadable[Word2VecModel

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-01-16 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/16607#discussion_r96328450 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -320,14 +341,29 @@ object Word2VecModel extends MLReadable[Word2VecModel

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-01-16 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/16607#discussion_r96327575 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -302,16 +303,36 @@ class Word2VecModel private[ml] ( @Since("

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-01-16 Thread Krimit
Github user Krimit commented on a diff in the pull request: https://github.com/apache/spark/pull/16607#discussion_r96327379 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -302,16 +303,36 @@ class Word2VecModel private[ml] ( @Since("

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large models

2017-01-16 Thread Krimit
GitHub user Krimit opened a pull request: https://github.com/apache/spark/pull/16607 [SPARK-19247][ML] Save large models ## What changes were proposed in this pull request? * save word2vec models as distributed files rather than as one large datum. Backwards compatibility