Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17192#discussion_r105059294
--- Diff:
sql/core/src/test/resources/sql-tests/results/json-functions.sql.out ---
@@ -32,32 +34,40 @@ Usage: to_json(expr[, options]) - Returns a json
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17213#discussion_r105063798
--- Diff: python/pyspark/sql/types.py ---
@@ -1249,7 +1249,7 @@ def _infer_schema_type(obj, dataType):
}
-def _verify_type(obj
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17213
Could we deal with SPARK-19507 together if it looks easy to fix it up
together? Also, I think we should run `./dev/lint-python`. It seems some lines
does not comply pep8 here. As a bonus, we
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17213
cc @dgingrich who I guess the reporter of SPARK-19507 - what do you think
about this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17192#discussion_r105101799
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1339,6 +1339,11 @@ test_that("column functions", {
expect_equal(collect
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/17224
[SPARK-19882][SQL] Pivot with null as the pivot value throws NPE
## What changes were proposed in this pull request?
This PR fixes two problems as below:
Actually, there are
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105290321
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -216,4 +216,10 @@ class DataFramePivotSuite extends QueryTest
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17227
Yea, if the change is not too different, I would rather help the review the
first PR before taking over.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17227
BTW, let's put `[WIP]` in the title if it is.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17225
Hi @dongjinleekr, how about opening a jira and adding it to the title? lt
seems not a minor change that does not need a JIRA.
---
If your project is set up for it, you can reply to this email
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17213
Let me then resolve this JIRA as a duplicate and reopen @dgingrich's one.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17227
(Let's proceed with this one per
https://github.com/apache/spark/pull/17213#issuecomment-285530248)
---
If your project is set up for it, you can reply to this email and have your
reply a
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17177#discussion_r105304318
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
---
@@ -88,6 +88,7 @@ private[csv] class CSVOptions
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17177#discussion_r105306510
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -465,6 +465,44 @@ class CSVSuite extends
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17177#discussion_r105307099
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -465,6 +465,44 @@ class CSVSuite extends
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17177#discussion_r105306617
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -465,6 +465,44 @@ class CSVSuite extends
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17177#discussion_r105307145
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -465,6 +465,44 @@ class CSVSuite extends
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17177#discussion_r105306543
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -465,6 +465,44 @@ class CSVSuite extends
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17177#discussion_r105306195
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
---
@@ -88,6 +88,7 @@ private[csv] class CSVOptions
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105322090
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -522,7 +522,7 @@ class Analyzer
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105322199
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -216,4 +216,18 @@ class DataFramePivotSuite extends QueryTest
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105322903
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -216,4 +216,18 @@ class DataFramePivotSuite extends QueryTest
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17226
@aray, I will close mine as soon as I check
https://github.com/apache/spark/pull/17226#discussion_r105322903 can be
resolved because that was the point I did not fix the optimized path
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17226
@aray, this is a regression as I described in my PR that is introduced by
this optimization.
Spark 1.6.
```
+++---+
| a|null| 1
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105325964
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -522,7 +522,7 @@ class Analyzer
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17226
> Spark 2.0+ with PivotFirst gives a NPE when one of the pivot column
values is null. The main thing fixed in this PR.
I meant to say it is not fully fixed because it does not NPE
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17226
cc @cloud-fan and @yhuai could you pick up one of them? Let me follow.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17224#discussion_r105327492
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -524,15 +529,21 @@ class Analyzer
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17226
> we're not introducing a regression in this PR by fixing the NPE, the
answer given by 1.6 was incorrect under any interpenetration
Right, if it was a bug, then this PR intro
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17177#discussion_r105337093
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -693,8 +697,8 @@ def text(self, path, compression=None):
@since(2.0)
def csv
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17192#discussion_r105337154
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1339,6 +1339,11 @@ test_that("column functions", {
expect_equal(collect
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17226
I see. So, `count` in "**Spark 2.1.0** (and presumably 2.0.x/master)" was
unexpectedly introduced by the optimization in SPARK-13749 and this behaviour
change between 1.6 and master (
Github user HyukjinKwon closed the pull request at:
https://github.com/apache/spark/pull/17224
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17224
I am closing this per
https://github.com/apache/spark/pull/17226#issuecomment-285597434
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17226
For me, LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17192
@brkyvz and @marmbrus, I think it is okay for SQL test and R change and it
is ready for a look. Could you take a look please?
---
If your project is set up for it, you can reply to this email
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17243
I don't think this is particular better too. We should avoid style changes
by a personal taste.
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17239
@nischay21 Could you click "Close pull request" below? This is not the
place where we are supposed to report issues. This causes a build failure mark
on some branches.
---
If your
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17178
@felixcheung, I tried to add an optional parameter, `asArray`. Could you
take a look and see if it makes sense?
---
If your project is set up for it, you can reply to this email and have your
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17177
Thank you so much both for your efforts and time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/17255
[SPARK-19918[SQL] Use TextFileFormat in implementation of JsonFileFormat
## What changes were proposed in this pull request?
This PR proposes to use text datasource when Json schema
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17255
(let me wait for the tests before cc'ing someone)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/17256
[SPARK-19919][SQL] Defer throwing the exception for empty paths in CSV
datasource into `DataSource`
## What changes were proposed in this pull request?
This PR proposes to defer
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17256
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17255
cc @cloud-fan, @joshrosen and @NathanHowell could you take a look and see
if it makes sense when you have some time?
---
If your project is set up for it, you can reply to this email and have
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17256#discussion_r10045
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -54,10 +54,21 @@ abstract class
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17264#discussion_r10670
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -196,6 +198,11 @@ private[orc] class OrcSerializer
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17255
Let me take a look for that. I think we can replace `sample` at least.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17178
Thank you @felixcheung. Let me try to handle the comments soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17255#discussion_r105591511
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
---
@@ -123,24 +124,31 @@ object JsonDataSource
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17255#discussion_r105596175
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonUtils.scala
---
@@ -0,0 +1,50 @@
+/*
--- End diff
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17255#discussion_r105597144
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
---
@@ -17,32 +17,30 @@
package
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17256
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17255
It seems all builds were gone failed unexpectedly. Let me restart.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17192#discussion_r105602228
--- Diff: python/pyspark/sql/functions.py ---
@@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}):
@since(2.1)
def to_json(col
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17192#discussion_r105605336
--- Diff: python/pyspark/sql/functions.py ---
@@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}):
@since(2.1)
def to_json(col
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17178#discussion_r105626282
--- Diff: R/pkg/R/functions.R ---
@@ -2452,11 +2453,18 @@ setMethod("date_format", signature(y = "Column", x
= "character&
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17178
> array might be a bit vague - does that require the column value to be a
JSON array, or just multiple JSON object spanning multiple line?
Yup, it requires that input is a JSON ar
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17178
How about `as.struct`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17267#discussion_r105653661
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
self.stackTrace = stackTrace
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17267#discussion_r105654236
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
self.stackTrace = stackTrace
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17267#discussion_r105654542
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
self.stackTrace = stackTrace
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17267#discussion_r105659050
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
self.stackTrace = stackTrace
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17267#discussion_r105664922
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
self.stackTrace = stackTrace
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15666#discussion_r105723482
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple
trans
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15666#discussion_r105723828
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
invisible(callJMethod(sc, "
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/17282
[SPARK-19872][PYTHON] Only reseralize BatchedSerializers when repartitioning
## What changes were proposed in this pull request?
This PR proposes to avoid serialization for non-batched
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17282
cc @davies, could you see if it makes sense?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/16611
@rxin, please let me know if there is anything you are not sure of. I will
double check. I am fine with closing too if you are not sure of the
implementation for now.
---
If your project is
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/13116
Hi @burness, what's the state of this PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17255#discussion_r105835549
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
---
@@ -40,18 +40,11 @@ private[sql] object
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17255#discussion_r105835718
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
---
@@ -40,18 +40,11 @@ private[sql] object
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15666#discussion_r105839398
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
invisible(callJMethod(sc, "
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17178
Let me go for `as.json.object`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17178#discussion_r105859422
--- Diff: R/pkg/R/functions.R ---
@@ -2452,11 +2453,18 @@ setMethod("date_format", signature(y = "Column", x
= "character&
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17225
@dongjinleekr, it seems there is a JIRA -
https://issues.apache.org/jira/browse/SPARK-19112. Let's add this in the title
of this PR if they are the same.
---
If your project is set up f
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17178#discussion_r105871730
--- Diff: R/pkg/R/functions.R ---
@@ -2452,11 +2453,18 @@ setMethod("date_format", signature(y = "Column", x
= "character&
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17177
I think it is good to add as it is also described in univocity library but
I am not too sure if it is worth exposing an option that has currently a little
bug. Maybe we could close for now and
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/13116
Ah, that's fine. I just wondered if this is on a progress and we maybe
close this for now.
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17291
Thank you for cc'ing me @cloud-fan. Yes, I support this idea and my final
plan was also the de-duplication.
One thing I am worried of is though, some code paths (for ex
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17255#discussion_r106043110
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
---
@@ -40,18 +40,11 @@ private[sql] object
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17293
Oh @kiszk, I think this deals with the essentially the same problem in
https://github.com/apache/spark/pull/14124 in a different way. I initially
proposed forcing all cases into nullable schema
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17177
@ep1804 @jbax Thank you. I will cc and inform you both when I happen to see
a PR bumping up the version to 2.4.0 (or probably I guess I will).
---
If your project is set up for it, you can
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17282
@viirya you are right. I overlooked. Thanks for correcting this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17282
@viirya, thank you for taking a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15815#discussion_r87111505
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1612,7 +1612,9 @@ class Dataset[T] private[sql
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15815
How about the ones in Python and R? (If we should change them too, don't
forget mark down `Note:` to `..note:` and `Dataset` to ``:class:`DataFrame` ``
to mark down pretty for Python).
-
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15813#discussion_r87121887
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
---
@@ -210,14 +196,21 @@ class CSVFileFormat
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15836
Could you please fix the title to the one that summerises what this PR
proposes?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15835
Hi @pwoody, So, if I understood this correctly, the original PR only
filters out the files to touch ahead but this one proposes also to filter
splits via offsets from Parquet's metada
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15835
I think we should cc @liancheng as well who is insightful in this area.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15689
@skanjila It seems something gone wrong. Could you please rebase this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15835
Ah, I missed the last comment from the old PR. Okay, we can make this
shaped nicer. BTW, Spark collects small partitions for each task so I guess
this would not introduce a lot of tasks always
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14649
@andreweduffy, are you fine with #15835? If so, we might be able to close
this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15835#discussion_r87382199
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
---
@@ -272,6 +275,83 @@ class
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15835#discussion_r87382135
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala
---
@@ -75,6 +76,17 @@ trait FileFormat
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15835#discussion_r87382447
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
---
@@ -272,6 +275,83 @@ class
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15841#discussion_r87384086
--- Diff:
examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala ---
@@ -60,7 +60,8 @@ object SparkSQLExample
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15841
As far as I know, the default file system should be `file:///`.
I created a virtual machine with the OS `Ubuntu Server Trusty 14.04 amd64`,
from the link
https://oss
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15841#discussion_r87404895
--- Diff:
examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala ---
@@ -60,7 +60,8 @@ object SparkSQLExample
1001 - 1100 of 12634 matches
Mail list logo