Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17673
@hhbyyh Thanks for your suggestions. Will try to incorporate these in a day
or so.
---
-
To unsubscribe, e-mail: reviews
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r149215757
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r144124146
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143573574
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143572643
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143572667
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143572179
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143572149
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143570173
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143570164
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143569334
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143569449
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143567888
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143567788
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143567807
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143566804
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143529694
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
@@ -245,5 +508,28 @@ class Word2VecSuite extends SparkFunSuite
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143529286
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
@@ -189,6 +305,136 @@ class Word2VecSuite extends SparkFunSuite
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143528339
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -171,20 +210,46 @@ final class Word2Vec @Since("
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143528173
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -171,20 +210,46 @@ final class Word2Vec @Since("
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143516772
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -106,6 +106,45 @@ private[feature] trait Word2VecBase extends Params
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143516595
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -106,6 +106,45 @@ private[feature] trait Word2VecBase extends Params
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143516496
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -106,6 +106,45 @@ private[feature] trait Word2VecBase extends Params
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143516384
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -106,6 +106,45 @@ private[feature] trait Word2VecBase extends Params
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17673
Thanks for your comments/suggestions @MLnick and @sethah . Working on
incorporating these.
---
-
To unsubscribe, e-mail
GitHub user shubhamchopra opened a pull request:
https://github.com/apache/spark/pull/18123
[SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Sampling
## What changes were proposed in this pull request?
This enhances [CBOW + Negative
Sampling](https://github.com/apache
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17673
Code-review comments/suggestions so far have been incorporated. Thanks for
looking into the code. Happy to incorporate more suggestions and feedback.
---
If your project is set up
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r115009247
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -36,7 +36,10 @@ import org.apache.spark.util.{Utils, VersionUtils
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17673
@MLnick I half expected that. No worries. I have incorporated some of your
feedback in the meantime and also added subsampling as well. Thanks for looking
into the code.
---
If your project
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17519
@lins05 Apologies for the delay in responding and thanks for adding the
docs. LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17673
@Krimit
_Can you provide some information about the practical differences between
CBOW and skip-grams?_
![Model
Architectures](https://cloud.githubusercontent.com/assets/6588487
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17673
@Krimit @MLnick @hhbyyh I am working on getting your earlier queries
answered.
@Krimit Thanks for looking into the code, I will try to get the code-review
feedback incorporated
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17673
The [original paper](https://arxiv.org/abs/1301.3781) proposed two model
architectures for generating word embeddings, Continuous Skip-Gram model and
continuous Bag-of-words model. Spark ML
GitHub user shubhamchopra opened a pull request:
https://github.com/apache/spark/pull/17673
[SPARK-20372] [ML] Word2Vec Continuous Bag of Words model
## What changes were proposed in this pull request?
This adds Continuous Bag of Words implementation to Word2Vec
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13932
Rebased to master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17325#discussion_r108261401
--- Diff:
core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala
---
@@ -481,27 +481,39 @@ class
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17325
Thanks @kayousterhout !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13932
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17325#discussion_r108241606
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1179,9 +1179,13 @@ private[spark] class BlockManager
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/13932#discussion_r108204165
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala ---
@@ -53,6 +53,46 @@ trait BlockReplicationPolicy
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/13932#discussion_r108203064
--- Diff:
core/src/test/scala/org/apache/spark/storage/BlockReplicationPolicySuite.scala
---
@@ -68,7 +68,60 @@ class BlockReplicationPolicySuite
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/13932#discussion_r108202578
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala ---
@@ -88,26 +129,96 @@ class RandomBlockReplicationPolicy
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/17325#discussion_r108198851
--- Diff:
core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala
---
@@ -481,27 +481,39 @@ class
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/17325
Elaborating a little more on how replication happens and what the code
change here does:
Spark executors cache a list of peers that is refreshed every 60s by
default. When replicating
GitHub user shubhamchopra opened a pull request:
https://github.com/apache/spark/pull/17325
[SPARK-19803][CORE][TEST] Proactive replication test failures
## What changes were proposed in this pull request?
Executors cache a list of their peers that is refreshed by default every
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13932
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13932
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13932
Rebased to resolve merge conflicts.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/14412
"spark.storage.exceptionOnPinLeak" based check only works if executors are
created. I put in an assertion check using logic similar to it in the
testProactiveReplication tests.
-
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r101854267
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
---
@@ -195,17 +198,39 @@ class BlockManagerMasterEndpoint
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r101851236
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
---
@@ -195,17 +198,39 @@ class BlockManagerMasterEndpoint
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r101847693
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1131,14 +1131,43 @@ private[spark] class BlockManager
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r101847672
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1131,14 +1131,43 @@ private[spark] class BlockManager
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r101843543
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
---
@@ -195,17 +198,39 @@ class BlockManagerMasterEndpoint
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r101843045
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1131,14 +1131,47 @@ private[spark] class BlockManager
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13932
jenkins ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13932
No test errors. Looks like the test process was killed midway. Tests added
as a part of this PR took less than 7s, so couldn't have caused the delay.
---
If your project is set up
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r99237148
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
---
@@ -188,24 +189,45 @@ class BlockManagerMasterEndpoint
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r99190414
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
---
@@ -188,24 +189,45 @@ class BlockManagerMasterEndpoint
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r99189219
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1152,20 +1185,25 @@ private[spark] class BlockManager
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r99185105
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1152,20 +1185,25 @@ private[spark] class BlockManager
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r99174290
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1131,14 +1131,47 @@ private[spark] class BlockManager
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r99174354
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1131,14 +1131,47 @@ private[spark] class BlockManager
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13932
Rebased to master to resolve merge conflict
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13152
Rebased to master to resolve merge conflicts
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13152
Thanks for the suggestions. I have corrected the style check errors and
verified that locally, so hopefully there are not more style errors. I have
also done a couple of modifications per
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/13152#discussion_r7684
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala ---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/13152#discussion_r74650320
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1088,109 +1108,88 @@ private[spark] class BlockManager
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/14412#discussion_r74088323
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala
---
@@ -37,10 +37,11 @@ import org.apache.spark.util.Utils
class
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/13152#discussion_r73751665
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1088,109 +1108,88 @@ private[spark] class BlockManager
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/13152#discussion_r73593762
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -159,8 +162,27 @@ private[spark] class BlockManager
GitHub user shubhamchopra opened a pull request:
https://github.com/apache/spark/pull/14412
[SPARK-15355] [CORE] [WIP] Proactive block replication
## What changes were proposed in this pull request?
We are proposing addition of pro-active block replication in case
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13152
The state being managed inside getRandomPeer() is also modified in a couple
of other places, so it won't be a very clean change to remove some of it out of
getRandomPeer. Even if that is done
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13152
The topology info is only queried when the executor initiates and is
assumed to stay the same throughout the life of the executor. Depending on the
cluster manager being used, I am assuming
Github user shubhamchopra commented on the issue:
https://github.com/apache/spark/pull/13932
Based on feedback from @rxin, added a Basic Strategy that replicates HDFS
behavior as a simpler alternative to the constraint solver. I also ran some
performance tests on the constraint
GitHub user shubhamchopra opened a pull request:
https://github.com/apache/spark/pull/13932
[SPARK-15354] [CORE] [WIP] Topology aware block replication strategies
## What changes were proposed in this pull request?
Implementations of strategies for resilient block
Github user shubhamchopra commented on the pull request:
https://github.com/apache/spark/pull/13152#issuecomment-220156087
Fixed style issues pointed out by @HyukjinKwon
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user shubhamchopra commented on a diff in the pull request:
https://github.com/apache/spark/pull/13152#discussion_r63780828
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1079,109 +1103,97 @@ private[spark] class BlockManager
GitHub user shubhamchopra opened a pull request:
https://github.com/apache/spark/pull/13152
[SPARK-15353] [CORE] Making peer selection for block replication pluggable
## What changes were proposed in this pull request?
This PR makes block replication strategies pluggable
79 matches
Mail list logo