Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #68477 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68477/consoleFull)**
for PR 14124 at commit
[`d240c0d`](https://github.com/apache/spark/commit/d
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #68474 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68474/consoleFull)**
for PR 14124 at commit
[`7306937`](https://github.com/apache/spark/commit/7
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
Actually, nvm. I think handling this in `DataFrameReader.schema` will deal
with most of general cases.
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
Oh wait, @cloud-fan, it seems, at least, Parquet files could possibly be
written with not nullable fields. So, reading it without user-specified schema
might also cause the inconsistency between
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
Thanks @cloud-fan, sure, that sounds great.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fea
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/14124
Sorry for the delay. After thinking it again, I think it doesn't make sense
to allow users to specify the nullability when reading a data source. How about
we turn schema to nullable in `DataFrame
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67029/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #67029 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67029/consoleFull)**
for PR 14124 at commit
[`3f153a3`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #67029 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67029/consoleFull)**
for PR 14124 at commit
[`3f153a3`](https://github.com/apache/spark/commit/3
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65747/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #65747 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65747/consoleFull)**
for PR 14124 at commit
[`0bc06c6`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #65747 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65747/consoleFull)**
for PR 14124 at commit
[`0bc06c6`](https://github.com/apache/spark/commit/0
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65360/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #65360 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65360/consoleFull)**
for PR 14124 at commit
[`f6be52b`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #65360 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65360/consoleFull)**
for PR 14124 at commit
[`f6be52b`](https://github.com/apache/spark/commit/f
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64631/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #64631 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64631/consoleFull)**
for PR 14124 at commit
[`ffacb55`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #64631 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64631/consoleFull)**
for PR 14124 at commit
[`ffacb55`](https://github.com/apache/spark/commit/f
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64066/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #64066 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64066/consoleFull)**
for PR 14124 at commit
[`079aae2`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #64066 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64066/consoleFull)**
for PR 14124 at commit
[`079aae2`](https://github.com/apache/spark/commit/0
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
@cloud-fan If nullability should be not ignored, then I can fix this PR to
make them consistent to not ignoring it (and of course I will try to identify
the related problems). In this case, I wi
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
BTW, actually, this is not only about user-given schema.
Currently, it always reads data into dataframe by datasources based on
`FileFormat` ignoring nullability in schema (for both user
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
Thanks for feedback @cloud-fan !
If the user-given schema is wrong, it is handled differently for each
datasource specific.
- For JSON and CSV
it is kind of permissive gen
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/14124
What will happen if the given schema is wrong? It seems weird that we allow
users to provide schema while reading the data, but without validating it.
---
If your project is set up for it, you ca
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/14124
@cloud-fan
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if t
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
gentle ping @marmbrus
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wish
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
Could you take a look please @marmbrus ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this featur
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
@viirya Thanks for your comment! Actually, that's I want to have some
feedback for from @marmbrus .
It seems forcing to a nullable schema all is already happening when you
read/write da
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/14124
@HyukjinKwon Your patch solves this inconsistency by forcing schema as
nullable at all. However, looks like the parquet case is for compatibility, is
this the same for json?
---
If your project is
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
I am a bit confused if we are allowed to read JSON (via `json(jsonRDD:
RDD[String])` API) with schema having fields set `false` in `nullable`.
If it is meant to be not allowed, this issue wil
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14124
@HyukjinKwon No matter whether this PR is merged or not, I still think we
should fix the above issue. Silent conversion does not look good to me.
---
If your project is set up for it, you can re
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
Oh, I see, before this patch
```
+---+
| a|
+---+
| 1|
| 0|
+---+
```
after this patch
```
++
| a|
++
| 1
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
Ah, yes it seems a bug to me.. I thought it throws an exception in that
case. Does this PR introduce the problem? (Just curious and to be sure).
---
If your project is set up for it, you can re
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14124
```
val rdd = spark.sparkContext.makeRDD(Seq("{\"a\" : 1}", "{\"a\" : null}"))
val schema = StructType(StructField("a", IntegerType, nullable = false) ::
Nil)
val df = spark.read.schem
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62054/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #62054 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62054/consoleFull)**
for PR 14124 at commit
[`3980681`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #62054 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62054/consoleFull)**
for PR 14124 at commit
[`3980681`](https://github.com/apache/spark/commit/3
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62053/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #62053 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62053/consoleFull)**
for PR 14124 at commit
[`adae8de`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #62053 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62053/consoleFull)**
for PR 14124 at commit
[`adae8de`](https://github.com/apache/spark/commit/a
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62049/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14124
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #62049 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62049/consoleFull)**
for PR 14124 at commit
[`a917678`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14124
**[Test build #62049 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62049/consoleFull)**
for PR 14124 at commit
[`a917678`](https://github.com/apache/spark/commit/a
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14124
Hi @gatorsmile and @marmbrus, I saw the discussion and found you are
related with this one. Could you please review this?
---
If your project is set up for it, you can reply to this email and h
53 matches
Mail list logo