Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207738831
Yup, LGTM. Merged to master. Thanks all for review and @wangmiao1981 for
the PR.
---
If your project is set up for it, you can reply to this email and have your
reply a
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/12200
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is ena
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207628271
This LGTM
For defaults, if the Param applies to training only (like minDF), then it
makes sense to set the default in the Estimator. But if the Param applies
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207523226
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207523225
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207522950
**[Test build #55359 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55359/consoleFull)**
for PR 12200 at commit
[`e693c60`](https://g
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207507859
**[Test build #55359 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55359/consoleFull)**
for PR 12200 at commit
[`e693c60`](https://gi
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207266803
@wangmiao1981 I think you're right - we should move the `setDefault` call
for binary back to the trait as the param is shared between the estimator and
model. Thanks!
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58987840
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String)
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207082749
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207082751
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207082605
**[Test build #55241 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55241/consoleFull)**
for PR 12200 at commit
[`e1bffd7`](https://g
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-207072619
**[Test build #55241 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55241/consoleFull)**
for PR 12200 at commit
[`e1bffd7`](https://gi
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58929398
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String)
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206999812
@MLnick I will revise the test accordingly. I think after testing the
estimator, I need to turn off the flag of the trained model first. Otherwise,
the binary para
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206724684
@wangmiao1981 the tests are ok - but I do like @BrianCutler's idea of
fitting the vectorizer first, and then building the model from that fitted
vocab - it seems slightl
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58826834
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String)
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206523617
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206523611
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206523215
**[Test build #55135 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55135/consoleFull)**
for PR 12200 at commit
[`42fdfee`](https://g
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206507539
**[Test build #55135 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55135/consoleFull)**
for PR 12200 at commit
[`42fdfee`](https://gi
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206506499
**[Test build #55134 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55134/consoleFull)**
for PR 12200 at commit
[`de81c35`](https://g
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206506504
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206506503
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206506032
**[Test build #55134 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55134/consoleFull)**
for PR 12200 at commit
[`de81c35`](https://gi
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206502911
**[Test build #55133 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55133/consoleFull)**
for PR 12200 at commit
[`5b35fb9`](https://g
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206502928
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206502936
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206502365
**[Test build #55133 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55133/consoleFull)**
for PR 12200 at commit
[`5b35fb9`](https://gi
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206501277
**[Test build #55132 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55132/consoleFull)**
for PR 12200 at commit
[`7c89370`](https://g
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206501293
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206501286
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206500835
**[Test build #55132 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55132/consoleFull)**
for PR 12200 at commit
[`7c89370`](https://gi
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206500157
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
ena
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58754677
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,19 @@ class CountVectorizerSuite extends SparkFunSuit
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58754307
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends SparkF
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58753889
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends SparkFunSuit
Github user BryanCutler commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206493241
@wangmiao1981 the test looks ok to me now, although I'm not too sure why
you added extra 'a's and 'b's, but that's fine.
---
If your project is set up for it, you
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58752457
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -114,7 +114,7 @@ class CountVectorizerSuite extends SparkFun
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58752409
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,19 @@ class CountVectorizerSuite extends SparkFu
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58750775
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends SparkF
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58749870
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends SparkFu
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58748929
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends SparkF
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58746316
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends SparkF
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58744522
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends SparkF
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58744168
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends SparkFu
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58743453
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String)
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58742287
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String)
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58740250
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -100,6 +103,24 @@ private[feature] trait CountVectorizerParams e
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58739580
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -42,7 +42,8 @@ private[feature] trait CountVectorizerParams exte
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58673286
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -100,6 +103,24 @@ private[feature] trait CountVectorizerParams extends
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58672875
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -100,6 +103,24 @@ private[feature] trait CountVectorizerParams extends
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58672971
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -184,7 +208,8 @@ object CountVectorizer extends
DefaultParamsReadable
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58672317
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -42,7 +42,8 @@ private[feature] trait CountVectorizerParams extends
P
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206186423
I'll do it once you update the tests as discussed (so we only test when
necessary) :)
---
If your project is set up for it, you can reply to this email and have your
re
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58662986
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -115,6 +115,27 @@ class CountVectorizerSuite extends SparkFunSuit
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206181665
@MLnick can you trigger the auto test? It seems that I am not in the white
list. I had one JIRA merged to master. Thanks!
Miao
---
If your project is set up
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58662007
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -115,6 +115,27 @@ class CountVectorizerSuite extends SparkF
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58657525
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -115,6 +115,27 @@ class CountVectorizerSuite extends SparkFunSuit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206140085
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12200
[SPARK-14392][ML]CountVectorizer Estimator should include binary toggle
Param
## What changes were proposed in this pull request?
CountVectorizerModel has a binary toggle param. This
61 matches
Mail list logo