Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20648
I think at least we should update the document for this behavior of csv
reader.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
_To me_ I have been roughly thinking that we should better match it to R's
read.csv and explicitly document this. I believe this is a good reference our
CSV has resembled so far.
BTW, I
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
> Yup, +1 for starting this by disallowing but up to my knowledge R's
read.csv allows then the legnth of tokens are shorter then its schema, putting
nulls (or NA) into missing fields, as a valid cas
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
@HyukjinKwon @cloud-fan Thanks for the comment! Yes, I agreed we need to
keep the CSV's behavior. I will check how much we can clean up with it.
---
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20648
> allows the length of tokens are shorter than its schema, putting nulls
(or NA) into missing fields
Actually I also recalled this is a valid case for csv, and I remember that
we did this
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
+1 for disallowing it anyway if it was Wenchen's opinion too. Please go
ahead. Will help double check anyway.
---
-
To unsub
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
Yup, +1 for starting this by disallowing but up to my knowledge R's
read.csv allows then the legnth of tokens are shorter then its schema, putting
nulls (or NA) into missing fields, as a valid c
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
@HyukjinKwon From the document of `DataFrameReader.csv`, the behavior of
CSV reader isn't consistent with the document.
```
`PERMISSIVE` : sets other fields to `null` when it meets a corr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
Yup, it's unsupported in JSON but CSV supports it. Do you mean to disallow
CSV too, or simply clean up JSON code path?
---
-
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
I'll close this PR and create another PR to refactor JSON parser and
related codes. Thanks @cloud-fan and @HyukjinKwon.
---
-
To
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
According to offline discussion with @cloud-fan, partial results are not
supported at all now. We should refactor the code to clear it and reduce
confusion.
---
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
How about we start this by disallowing the partial results at all,
documenting the behaviour and matching the behaviour to R's `read.csv(...)` in
case of CSV (in terms of which case is an error
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20648
I think we do have an intention to return partial result, but there is no
strict definition for it, and seems there is no public document, so it's kind
of a new feature.
Since this is a n
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
Yes, thanks @HyukjinKwon for checking the behavior. If we look at the codes
of JSON parser, we will find many places indicating the expectation of
availability of partial results.
For exampl
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
I was just double checking the current status for both CSV and JSON:
Seems CSV fills up the partial results with an exception (which is caught
by permissive mode with the corrupt record
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
From the codes, looks like there is an intention to have partial results
when failing to parse the documents. This patch makes the partial results. But
this should be considered as behavior change, a
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87606/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87606 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87606/testReport)**
for PR 20648 at commit
[`667dcd5`](https://github.com/apache/spark/commit/6
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87606 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87606/testReport)**
for PR 20648 at commit
[`667dcd5`](https://github.com/apache/spark/commit/66
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1001/
Tes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
`FileBasedDataSourceSuite` is still flaky.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87603/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87603 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87603/testReport)**
for PR 20648 at commit
[`667dcd5`](https://github.com/apache/spark/commit/6
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87603 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87603/testReport)**
for PR 20648 at commit
[`667dcd5`](https://github.com/apache/spark/commit/66
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/999/
Test
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87600/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87600 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87600/testReport)**
for PR 20648 at commit
[`667dcd5`](https://github.com/apache/spark/commit/6
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/997/
Test
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87600 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87600/testReport)**
for PR 20648 at commit
[`667dcd5`](https://github.com/apache/spark/commit/66
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87586/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87586 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87586/testReport)**
for PR 20648 at commit
[`667dcd5`](https://github.com/apache/spark/commit/6
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/987/
Test
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87586 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87586/testReport)**
for PR 20648 at commit
[`667dcd5`](https://github.com/apache/spark/commit/66
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
Will check this one within tomorrow ..
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional com
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87581/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87581 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87581/testReport)**
for PR 20648 at commit
[`3d7d041`](https://github.com/apache/spark/commit/3
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20648
**[Test build #87581 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87581/testReport)**
for PR 20648 at commit
[`3d7d041`](https://github.com/apache/spark/commit/3d
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/983/
Test
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20648
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
cc @HyukjinKwon Can you check out if this behavior makes sense to you?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apa
52 matches
Mail list logo