[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-12-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14452 Revisit this by rebasing with master. BTW, in 500+ LOC changes, actually there are 200+ LOC changes are test cases. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #70541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70541/testReport)** for PR 14452 at commit

[GitHub] spark issue #16232: [SPARK-18800][SQL] Correct the assert in UnsafeKVExterna...

2016-12-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16232 ping @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #70540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70540/testReport)** for PR 13909 at commit

[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16337 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70535/ Test PASSed. ---

[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15666 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70534/ Test PASSed. ---

[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15666 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16337 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15666 **[Test build #70534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70534/testReport)** for PR 15666 at commit

[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16337 **[Test build #70535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70535/testReport)** for PR 16337 at commit

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-22 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13909 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15211 I've sent a new update addressing most of the comments. The only exception is about `SetWeightCol` in `LinearSVCModel`. cc @jkbradley. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70537/ Test FAILed. ---

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15211 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15211 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70539/ Test PASSed. ---

[GitHub] spark issue #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15211 **[Test build #70539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70539/testReport)** for PR 15211 at commit

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #70537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70537/testReport)** for PR 13909 at commit

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2016-12-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r93733483 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -36,29 +31,31 @@ import

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15212 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15212 **[Test build #70536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70536/testReport)** for PR 15212 at commit

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15212 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70536/ Test PASSed. ---

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2016-12-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r93732800 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -0,0 +1,204 @@ +/* + * Licensed

[GitHub] spark issue #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15211 **[Test build #70539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70539/testReport)** for PR 15211 at commit

[GitHub] spark issue #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame

2016-12-22 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16368 ah, it was merged https://git-wip-us.apache.org/repos/asf?p=spark.git --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame

2016-12-22 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16368 I kept getting error with the merge script - not sure if it went through. we are likely having some sync issue with github? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #16312: [SPARK-18862][SPARKR][ML] Split SparkR mllib.R into mult...

2016-12-22 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16312 ah, thank you @shivaram. sorry I couldn't get around to investigate earlier. @yanboliang It looks like that is the design in the trait BaseReadWrite

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16386 > the corrupt column will contain the filename instead of the literal JSON if there is a parsing failure I am worried of changing the behaviour. I understand why it had to be here as

[GitHub] spark issue #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame

2016-12-22 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16368 Hmm looks like this is merged but not reflected on github ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn...

2016-12-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16387#discussion_r93732158 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -192,12 +193,16 @@ class ExternalAppendOnlyMap[K, V, C](

[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16387 **[Test build #70538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70538/testReport)** for PR 16387 at commit

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-22 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93732043 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +58,100 @@ case class

[GitHub] spark pull request #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn...

2016-12-22 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/16387 [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail when forced to spill before calling its iterator ## What changes were proposed in this pull request?

[GitHub] spark issue #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame

2016-12-22 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16368 Merging this into master, branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16312: [SPARK-18862][SPARKR][ML] Split SparkR mllib.R into mult...

2016-12-22 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16312 I looked at this more closely and I think I found the problem - Not sure its easy to fix though. What I traced here is: - When we call sparkR.session.stop and sparkR.session the same JVM

[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...

2016-12-22 Thread yhuai
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/15996 ah https://github.com/apache/spark/commit/9a1ad71db44558bb6eb380dc23a1a1abbc2f3e98 failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #70537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70537/testReport)** for PR 13909 at commit

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2016-12-22 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r93731259 --- Diff: python/pyspark/sql/readwriter.py --- @@ -155,21 +155,24 @@ def load(self, path=None, format=None, schema=None, **options): return

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93731229 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,525 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15212 **[Test build #70536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70536/testReport)** for PR 15212 at commit

[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...

2016-12-22 Thread yhuai
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/15996 LGTM. Can you update the comment to address my last comment (https://github.com/apache/spark/pull/15996#discussion_r93730700)? --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...

2016-12-22 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/15996#discussion_r93730700 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -643,6 +644,14 @@ class DataFrameReaderWriterSuite

[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16337 **[Test build #70535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70535/testReport)** for PR 16337 at commit

[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

2016-12-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16337 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-22 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r93730314 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -164,6 +164,27 @@ private[spark] object TestUtils {

[GitHub] spark pull request #16384: [BUILD] make-distribution support alternate pytho...

2016-12-22 Thread felixcheung
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/16384 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-22 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r93729928 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -164,6 +164,27 @@ private[spark] object TestUtils {

[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15666 **[Test build #70534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70534/testReport)** for PR 15666 at commit

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16386 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16386 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70531/ Test FAILed. ---

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16386 **[Test build #70531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70531/testReport)** for PR 16386 at commit

[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15996 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70532/ Test FAILed. ---

[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15996 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15996 **[Test build #70532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70532/testReport)** for PR 15996 at commit

[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

2016-12-22 Thread kevinyu98
Github user kevinyu98 commented on the issue: https://github.com/apache/spark/pull/16337 I just run build/sbt "test-only org.apache.spark.sql.streaming.StreamSuite" on my local machine, also the whole sql suite, it works fine. Can you re-run the test? Thanks --- If your project is

[GitHub] spark pull request #16323: [SPARK-18911] [SQL] Define CatalogStatistics to i...

2016-12-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16323#discussion_r93726972 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -41,13 +41,13 @@ import

[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16228 **[Test build #70533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70533/testReport)** for PR 16228 at commit

[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16228 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70533/ Test FAILed. ---

[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16228 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16228 **[Test build #70533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70533/testReport)** for PR 16228 at commit

[GitHub] spark pull request #16323: [SPARK-18911] [SQL] Define CatalogStatistics to i...

2016-12-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16323#discussion_r93726768 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -237,6 +239,38 @@ case class CatalogTable( }

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93726522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +58,100 @@ case class

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93726073 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,11 +171,14 @@ object ChiSqSelectorModel extends

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93726194 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -255,19 +288,22 @@ class ChiSqSelector @Since("2.1.0") ()

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93725579 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -92,8 +92,36 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93726048 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -111,11 +139,14 @@ private[feature] trait ChiSqSelectorParams

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93725408 --- Diff: docs/mllib-feature-extraction.md --- @@ -227,11 +227,13 @@ both speed and statistical learning behavior.

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93726001 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -92,8 +92,36 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93725173 --- Diff: docs/ml-features.md --- @@ -1423,12 +1423,12 @@ for more details on the API. `ChiSqSelector` stands for Chi-Squared feature selection. It

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93726320 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala --- @@ -27,61 +27,240 @@ class ChiSqSelectorSuite extends

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93726203 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -255,19 +288,22 @@ class ChiSqSelector @Since("2.1.0") ()

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93726092 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -245,6 +264,20 @@ class ChiSqSelector @Since("2.1.0") () extends

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93725098 --- Diff: docs/ml-features.md --- @@ -1423,12 +1423,12 @@ for more details on the API. `ChiSqSelector` stands for Chi-Squared feature selection. It

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r93725546 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -92,8 +92,36 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark issue #16291: [SPARK-18838][CORE] Use separate executor service for ea...

2016-12-22 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16291 I agree with @markhamstra and @vanzin - having ability to tag listeners into groups (default = spark listener group) and preserving current synchronized behavior within group would be ensure

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-22 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 Hello recent JacksonGenerator.scala commiters, please take a look. cc/ @rxin @hvanhovell @clockfly @hyukjinkwon @cloud-fan --- If your project is set up for it, you can reply to this

[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15996 **[Test build #70532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70532/testReport)** for PR 15996 at commit

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16386 **[Test build #70531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70531/testReport)** for PR 16386 at commit

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2016-12-22 Thread NathanHowell
GitHub user NathanHowell opened a pull request: https://github.com/apache/spark/pull/16386 [SPARK-18352][SQL] Support parsing multiline json files ## What changes were proposed in this pull request? If a new option `wholeFile` is set to `true` the JSON reader will parse

[GitHub] spark pull request #16383: [SPARK-18980][SQL] implement Aggregator with Type...

2016-12-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16383#discussion_r93725196 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -143,15 +197,96 @@ case class

[GitHub] spark issue #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility - crea...

2016-12-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16119 @vijoshi do you mind updating your PR according to the dicussion? i.e. simplify the fix and test --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-12-22 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Not sure if my patch makes the tests unstable. But I can't figure out why. @kayousterhout @mridulm any ideas? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #16383: [SPARK-18980][SQL] implement Aggregator with Type...

2016-12-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16383#discussion_r93724428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -505,19 +511,18 @@ abstract class

[GitHub] spark pull request #16383: [SPARK-18980][SQL] implement Aggregator with Type...

2016-12-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16383#discussion_r93724370 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -471,23 +471,29 @@ abstract class

[GitHub] spark issue #14627: [SPARK-16975][SQL][FOLLOWUP] Do not duplicately check fi...

2016-12-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14627 @rxin, it does not fix any bug but just gets rid of duplicated logics. I will try to open a separate JIRA in this case in the future to prevent confusion. Thank you/ --- If your project is

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16371 @hvanhovell Got it. Thanks for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16361: [SPARK-18952] Regex strings not properly escaped in code...

2016-12-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16361 it seems to that the grouping key alias is only used for execution(logical Aggregate node doesn't need grouping expression to be named), can we just alias them with k1,k2, ... with avoid this

[GitHub] spark issue #16294: [SPARK-18669][SS][DOCS] Update Apache docs for Structure...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16294 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16294: [SPARK-18669][SS][DOCS] Update Apache docs for Structure...

2016-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16294 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70530/ Test PASSed. ---

[GitHub] spark issue #16294: [SPARK-18669][SS][DOCS] Update Apache docs for Structure...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16294 **[Test build #70530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70530/testReport)** for PR 16294 at commit

[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...

2016-12-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15996#discussion_r93723071 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala --- @@ -195,12 +195,25 @@ class

[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...

2016-12-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15996#discussion_r93723027 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -635,4 +638,13 @@ class DataFrameReaderWriterSuite

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16371 sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame

2016-12-22 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16368 LGTM. Thanks @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16294: [SPARK-18669][SS][DOCS] Update Apache docs for Structure...

2016-12-22 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16294 LGTM pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...

2016-12-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15996#discussion_r93722426 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala --- @@ -195,12 +195,25 @@ class

[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...

2016-12-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15996#discussion_r93722334 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -140,153 +140,55 @@ case class

[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...

2016-12-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15996#discussion_r93722277 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -363,48 +365,125 @@ final class DataFrameWriter[T] private[sql](ds:

[GitHub] spark issue #16370: [SPARK-18960][SQL][SS] Avoid double reading file which i...

2016-12-22 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16370 @zsxwing Thanks for your reminder!! In some ways, we really can evade this issue, just like not use `-cp`. But this is an user-side behaviour, we can not ensure every users know and use

[GitHub] spark issue #16294: [SPARK-18669][SS][DOCS] Update Apache docs for Structure...

2016-12-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16294 **[Test build #70530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70530/testReport)** for PR 16294 at commit

  1   2   3   4   >