[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

2017-03-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17192#discussion_r105059294 --- Diff: sql/core/src/test/resources/sql-tests/results/json-functions.sql.out --- @@ -32,32 +34,40 @@ Usage: to_json(expr[, options]) - Returns a json

[GitHub] spark pull request #17213: [SPARK-19871] [PySpark][SQL] Improve error messag...

2017-03-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17213#discussion_r105063798 --- Diff: python/pyspark/sql/types.py --- @@ -1249,7 +1249,7 @@ def _infer_schema_type(obj, dataType): } -def _verify_type(obj

[GitHub] spark issue #17213: [SPARK-19871] [PySpark][SQL] Improve error message in ve...

2017-03-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17213 Could we deal with SPARK-19507 together if it looks easy to fix it up together? Also, I think we should run `./dev/lint-python`. It seems some lines does not comply pep8 here. As a bonus, we

[GitHub] spark issue #17213: [SPARK-19871] [PySpark][SQL] Improve error message in ve...

2017-03-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17213 cc @dgingrich who I guess the reporter of SPARK-19507 - what do you think about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

2017-03-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17192#discussion_r105101799 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1339,6 +1339,11 @@ test_that("column functions", { expect_equal(collect

[GitHub] spark pull request #17224: [SPARK-19882][SQL] Pivot with null as the pivot v...

2017-03-09 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17224 [SPARK-19882][SQL] Pivot with null as the pivot value throws NPE ## What changes were proposed in this pull request? This PR fixes two problems as below: Actually, there are

[GitHub] spark pull request #17226: [SPARK-19882][SQL] Pivot with null as a distinct ...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17226#discussion_r105290321 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala --- @@ -216,4 +216,10 @@ class DataFramePivotSuite extends QueryTest

[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17227 Yea, if the change is not too different, I would rather help the review the first PR before taking over. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17227 BTW, let's put `[WIP]` in the title if it is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17225: [CORE] Support ZStandard Compression

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17225 Hi @dongjinleekr, how about opening a jira and adding it to the title? lt seems not a minor change that does not need a JIRA. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17213: [SPARK-19871] [PySpark][SQL] Improve error message in ve...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17213 Let me then resolve this JIRA as a duplicate and reopen @dgingrich's one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17227 (Let's proceed with this one per https://github.com/apache/spark/pull/17213#issuecomment-285530248) --- If your project is set up for it, you can reply to this email and have your reply a

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17177#discussion_r105304318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -88,6 +88,7 @@ private[csv] class CSVOptions

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17177#discussion_r105306510 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -465,6 +465,44 @@ class CSVSuite extends

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17177#discussion_r105307099 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -465,6 +465,44 @@ class CSVSuite extends

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17177#discussion_r105306617 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -465,6 +465,44 @@ class CSVSuite extends

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17177#discussion_r105307145 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -465,6 +465,44 @@ class CSVSuite extends

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17177#discussion_r105306543 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -465,6 +465,44 @@ class CSVSuite extends

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17177#discussion_r105306195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -88,6 +88,7 @@ private[csv] class CSVOptions

[GitHub] spark pull request #17226: [SPARK-19882][SQL] Pivot with null as a distinct ...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17226#discussion_r105322090 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -522,7 +522,7 @@ class Analyzer

[GitHub] spark pull request #17226: [SPARK-19882][SQL] Pivot with null as a distinct ...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17226#discussion_r105322199 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala --- @@ -216,4 +216,18 @@ class DataFramePivotSuite extends QueryTest

[GitHub] spark pull request #17226: [SPARK-19882][SQL] Pivot with null as a distinct ...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17226#discussion_r105322903 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala --- @@ -216,4 +216,18 @@ class DataFramePivotSuite extends QueryTest

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 @aray, I will close mine as soon as I check https://github.com/apache/spark/pull/17226#discussion_r105322903 can be resolved because that was the point I did not fix the optimized path

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 @aray, this is a regression as I described in my PR that is introduced by this optimization. Spark 1.6. ``` +++---+ | a|null| 1

[GitHub] spark pull request #17226: [SPARK-19882][SQL] Pivot with null as a distinct ...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17226#discussion_r105325964 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -522,7 +522,7 @@ class Analyzer

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 > Spark 2.0+ with PivotFirst gives a NPE when one of the pivot column values is null. The main thing fixed in this PR. I meant to say it is not fully fixed because it does not NPE

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 cc @cloud-fan and @yhuai could you pick up one of them? Let me follow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #17224: [SPARK-19882][SQL] Pivot with null as the dictinc...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17224#discussion_r105327492 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -524,15 +529,21 @@ class Analyzer

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 > we're not introducing a regression in this PR by fixing the NPE, the answer given by 1.6 was incorrect under any interpenetration Right, if it was a bug, then this PR intro

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17177#discussion_r105337093 --- Diff: python/pyspark/sql/readwriter.py --- @@ -693,8 +697,8 @@ def text(self, path, compression=None): @since(2.0) def csv

[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17192#discussion_r105337154 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1339,6 +1339,11 @@ test_that("column functions", { expect_equal(collect

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 I see. So, `count` in "**Spark 2.1.0** (and presumably 2.0.x/master)" was unexpectedly introduced by the optimization in SPARK-13749 and this behaviour change between 1.6 and master (

[GitHub] spark pull request #17224: [SPARK-19882][SQL] Pivot with null as the dictinc...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/17224 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17224: [SPARK-19882][SQL] Pivot with null as the dictinct pivot...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17224 I am closing this per https://github.com/apache/spark/pull/17226#issuecomment-285597434 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 For me, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17192 @brkyvz and @marmbrus, I think it is okay for SQL test and R change and it is ready for a look. Could you take a look please? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17243: [SPARK-19901][Core]Clean up the clunky method signature ...

2017-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17243 I don't think this is particular better too. We should avoid style changes by a personal taste. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark issue #17239: Using map function in spark for huge operation

2017-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17239 @nischay21 Could you click "Close pull request" below? This is not the place where we are supposed to report issues. This causes a build failure mark on some branches. --- If your

[GitHub] spark issue #17178: [SPARK-19828][R] Support array type in from_json in R

2017-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17178 @felixcheung, I tried to add an optional parameter, `asArray`. Could you take a look and see if it makes sense? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17177 Thank you so much both for your efforts and time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17255: [SPARK-19918[SQL] Use TextFileFormat in implement...

2017-03-10 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17255 [SPARK-19918[SQL] Use TextFileFormat in implementation of JsonFileFormat ## What changes were proposed in this pull request? This PR proposes to use text datasource when Json schema

[GitHub] spark issue #17255: [SPARK-19918[SQL] Use TextFileFormat in implementation o...

2017-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17255 (let me wait for the tests before cc'ing someone) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #17256: [SPARK-19919][SQL] Defer throwing the exception f...

2017-03-10 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17256 [SPARK-19919][SQL] Defer throwing the exception for empty paths in CSV datasource into `DataSource` ## What changes were proposed in this pull request? This PR proposes to defer

[GitHub] spark issue #17256: [SPARK-19919][SQL] Defer throwing the exception for empt...

2017-03-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17256 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17255: [SPARK-19918[SQL] Use TextFileFormat in implementation o...

2017-03-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17255 cc @cloud-fan, @joshrosen and @NathanHowell could you take a look and see if it makes sense when you have some time? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #17256: [WIP][SPARK-19919][SQL] Defer throwing the except...

2017-03-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17256#discussion_r10045 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -54,10 +54,21 @@ abstract class

[GitHub] spark pull request #17264: [SPARK-19923][SQL] Remove unnecessary type conver...

2017-03-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17264#discussion_r10670 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -196,6 +198,11 @@ private[orc] class OrcSerializer

[GitHub] spark issue #17255: [SPARK-19918][SQL] Use TextFileFormat in implementation ...

2017-03-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17255 Let me take a look for that. I think we can replace `sample` at least. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17178: [SPARK-19828][R] Support array type in from_json in R

2017-03-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17178 Thank you @felixcheung. Let me try to handle the comments soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17255#discussion_r105591511 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -123,24 +124,31 @@ object JsonDataSource

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17255#discussion_r105596175 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonUtils.scala --- @@ -0,0 +1,50 @@ +/* --- End diff

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17255#discussion_r105597144 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -17,32 +17,30 @@ package

[GitHub] spark issue #17256: [SPARK-19919][SQL] Defer throwing the exception for empt...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17256 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17255: [SPARK-19918][SQL] Use TextFileFormat in implementation ...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17255 It seems all builds were gone failed unexpectedly. Let me restart. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17192#discussion_r105602228 --- Diff: python/pyspark/sql/functions.py --- @@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}): @since(2.1) def to_json(col

[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17192#discussion_r105605336 --- Diff: python/pyspark/sql/functions.py --- @@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}): @since(2.1) def to_json(col

[GitHub] spark pull request #17178: [SPARK-19828][R] Support array type in from_json ...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17178#discussion_r105626282 --- Diff: R/pkg/R/functions.R --- @@ -2452,11 +2453,18 @@ setMethod("date_format", signature(y = "Column", x = "character&

[GitHub] spark issue #17178: [SPARK-19828][R] Support array type in from_json in R

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17178 > array might be a bit vague - does that require the column value to be a JSON array, or just multiple JSON object spanning multiple line? Yup, it requires that input is a JSON ar

[GitHub] spark issue #17178: [SPARK-19828][R] Support array type in from_json in R

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17178 How about `as.struct`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105653661 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace

[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105654236 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace

[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105654542 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace

[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105659050 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace

[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105664922 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r105723482 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple trans

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r105723828 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "

[GitHub] spark pull request #17282: [SPARK-19872][PYTHON] Only reseralize BatchedSeri...

2017-03-13 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17282 [SPARK-19872][PYTHON] Only reseralize BatchedSerializers when repartitioning ## What changes were proposed in this pull request? This PR proposes to avoid serialization for non-batched

[GitHub] spark issue #17282: [SPARK-19872][PYTHON] Only reseralize BatchedSerializers...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17282 cc @davies, could you see if it makes sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16611 @rxin, please let me know if there is anything you are not sure of. I will double check. I am fine with closing too if you are not sure of the implementation for now. --- If your project is

[GitHub] spark issue #13116: [SPARK-15324] [SQL] Add the takeSample function to the D...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13116 Hi @burness, what's the state of this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17255#discussion_r105835549 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala --- @@ -40,18 +40,11 @@ private[sql] object

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17255#discussion_r105835718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala --- @@ -40,18 +40,11 @@ private[sql] object

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r105839398 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "

[GitHub] spark issue #17178: [SPARK-19828][R] Support array type in from_json in R

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17178 Let me go for `as.json.object`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17178: [SPARK-19828][R] Support array type in from_json ...

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17178#discussion_r105859422 --- Diff: R/pkg/R/functions.R --- @@ -2452,11 +2453,18 @@ setMethod("date_format", signature(y = "Column", x = "character&

[GitHub] spark issue #17225: [CORE] Support ZStandard Compression

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17225 @dongjinleekr, it seems there is a JIRA - https://issues.apache.org/jira/browse/SPARK-19112. Let's add this in the title of this PR if they are the same. --- If your project is set up f

[GitHub] spark pull request #17178: [SPARK-19828][R] Support array type in from_json ...

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17178#discussion_r105871730 --- Diff: R/pkg/R/functions.R --- @@ -2452,11 +2453,18 @@ setMethod("date_format", signature(y = "Column", x = "character&

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17177 I think it is good to add as it is also described in univocity library but I am not too sure if it is worth exposing an option that has currently a little bug. Maybe we could close for now and

[GitHub] spark issue #13116: [SPARK-15324] [SQL] Add the takeSample function to the D...

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13116 Ah, that's fine. I just wondered if this is on a progress and we maybe close this for now. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark issue #17291: [SPARK-19949][SQL][WIP] unify bad record handling in CSV...

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17291 Thank you for cc'ing me @cloud-fan. Yes, I support this idea and my final plan was also the de-duplication. One thing I am worried of is though, some code paths (for ex

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17255#discussion_r106043110 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala --- @@ -40,18 +40,11 @@ private[sql] object

[GitHub] spark issue #17293: [SPARK-19950][SQL] Fix to ignore nullable when df.load()...

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17293 Oh @kiszk, I think this deals with the essentially the same problem in https://github.com/apache/spark/pull/14124 in a different way. I initially proposed forcing all cases into nullable schema

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17177 @ep1804 @jbax Thank you. I will cc and inform you both when I happen to see a PR bumping up the version to 2.4.0 (or probably I guess I will). --- If your project is set up for it, you can

[GitHub] spark issue #17282: [SPARK-19872][PYTHON] Only reseralize with BatchedSerial...

2017-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17282 @viirya you are right. I overlooked. Thanks for correcting this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17282: [SPARK-19872][PYTHON] Use the correct deserializer for R...

2017-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17282 @viirya, thank you for taking a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15815: [DOCS][SPARK-18365] Documentation is Switched on ...

2016-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15815#discussion_r87111505 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1612,7 +1612,9 @@ class Dataset[T] private[sql

[GitHub] spark issue #15815: [DOCS][SPARK-18365] Documentation is Switched on Sample ...

2016-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15815 How about the ones in Python and R? (If we should change them too, don't forget mark down `Note:` to `..note:` and `Dataset` to ``:class:`DataFrame` `` to mark down pretty for Python). -

[GitHub] spark pull request #15813: [SPARK-18362][SQL] Use TextFileFormat in JsonFile...

2016-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15813#discussion_r87121887 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -210,14 +196,21 @@ class CSVFileFormat

[GitHub] spark issue #15836: [SPARK-18391]

2016-11-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15836 Could you please fix the title to the one that summerises what this PR proposes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15835: [SPARK-17059][SQL] Allow FileFormat to specify partition...

2016-11-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15835 Hi @pwoody, So, if I understood this correctly, the original PR only filters out the files to touch ahead but this one proposes also to filter splits via offsets from Parquet's metada

[GitHub] spark issue #15835: [SPARK-17059][SQL] Allow FileFormat to specify partition...

2016-11-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15835 I think we should cc @liancheng as well who is insightful in this area. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15689: [SPARK-9487] Use the same num. worker threads in Scala/P...

2016-11-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15689 @skanjila It seems something gone wrong. Could you please rebase this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15835: [SPARK-17059][SQL] Allow FileFormat to specify partition...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15835 Ah, I missed the last comment from the old PR. Okay, we can make this shaped nicer. BTW, Spark collects small partitions for each task so I guess this would not introduce a lot of tasks always

[GitHub] spark issue #14649: [SPARK-17059][SQL] Allow FileFormat to specify partition...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14649 @andreweduffy, are you fine with #15835? If so, we might be able to close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15835: [SPARK-17059][SQL] Allow FileFormat to specify pa...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15835#discussion_r87382199 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -272,6 +275,83 @@ class

[GitHub] spark pull request #15835: [SPARK-17059][SQL] Allow FileFormat to specify pa...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15835#discussion_r87382135 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -75,6 +76,17 @@ trait FileFormat

[GitHub] spark pull request #15835: [SPARK-17059][SQL] Allow FileFormat to specify pa...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15835#discussion_r87382447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -272,6 +275,83 @@ class

[GitHub] spark pull request #15841: [SPARK-18399][SQL][DOCUMENTATION] Examples in Spa...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15841#discussion_r87384086 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala --- @@ -60,7 +60,8 @@ object SparkSQLExample

[GitHub] spark issue #15841: [SPARK-18399][SQL][DOCUMENTATION] Examples in SparkSQL/D...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15841 As far as I know, the default file system should be `file:///`. I created a virtual machine with the OS `Ubuntu Server Trusty 14.04 amd64`, from the link https://oss

[GitHub] spark pull request #15841: [SPARK-18399][SQL][DOCUMENTATION] Examples in Spa...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15841#discussion_r87404895 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala --- @@ -60,7 +60,8 @@ object SparkSQLExample

<    6   7   8   9   10   11   12   13   14   15   >