Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14118
Merged to master/2.0
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65466/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #65466 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65466/consoleFull)**
for PR 14118 at commit
[`365cbfb`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #65466 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65466/consoleFull)**
for PR 14118 at commit
[`365cbfb`](https://github.com/apache/spark/commit/3
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14118
@lw-lin could you address @srowen's comments. Otherwise this is good to go.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65343/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #65343 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65343/consoleFull)**
for PR 14118 at commit
[`d5357f9`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #65343 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65343/consoleFull)**
for PR 14118 at commit
[`d5357f9`](https://github.com/apache/spark/commit/d
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
@HyukjinKwon thanks for the information!
@srowen yea I still think this is good to go.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishe
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14118
I support this PR. But just to make sure, I'd like to bring a reference.
It seems at least `na.strings` option in `read.csv` in R does as proposed
here,
```r
bt <- "A,B,C,D
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14118
@lw-lin just checking that you think this is still good to go? @HyukjinKwon
do you have an opinion on the current state?
---
If your project is set up for it, you can reply to this email and have yo
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65042/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #65042 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65042/consoleFull)**
for PR 14118 at commit
[`d5357f9`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #65042 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65042/consoleFull)**
for PR 14118 at commit
[`d5357f9`](https://github.com/apache/spark/commit/d
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14118
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64576/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #64576 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64576/consoleFull)**
for PR 14118 at commit
[`d5357f9`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #64576 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64576/consoleFull)**
for PR 14118 at commit
[`d5357f9`](https://github.com/apache/spark/commit/d
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishe
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
> What if I am writing explicitly an empty string out? Does it become just
1,,2?
Yes. It becomes `1,,2` in 2.0, and the same `1,,2` with this patch -- no
behavior changes.
> Can you
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14118
I believe we can change the default vale of `nullValue` to
`'\u'.toString` in order to express any value is not `null`. I remember
this matches with no empty string nor any other string alth
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14118
@rxin Please let me leave my though why I thought it looks good to me in
case it is helpful.
Yes, but we should set `nullValue` for writing `null`. So, I think, setting
`""` for `nullVa
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14118
What if I am writing explicitly an empty string out? Does it become just
1,,2?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
@rxin yes all empty (e.g. zero sized string) values become null values once
they are read back.
E.g. given `test.csv`:
```
1,,3,
```
`spark.read.csv("test.csv").show()` produc
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64040/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #64040 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64040/consoleFull)**
for PR 14118 at commit
[`74b4dd8`](https://github.com/apache/spark/commit/
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14118
Also LGTM other than that major question.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
ena
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14118
With this change, do all empty (e.g. zero sized string) values become null
values once they are read back?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #64040 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64040/consoleFull)**
for PR 14118 at commit
[`74b4dd8`](https://github.com/apache/spark/commit/7
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14118
This change looks good to me - I don't see any other reasons that `null`
should not be read for `Boolean`, `TimestampType`, `DateType` and `StringType`
inconsistently with other types.
---
If
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14118
CC @HyukjinKwon -- WDYT?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63713/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #63713 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63713/consoleFull)**
for PR 14118 at commit
[`f58e33d`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #63713 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63713/consoleFull)**
for PR 14118 at commit
[`f58e33d`](https://github.com/apache/spark/commit/f
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishe
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
This is ready for review.
To summarize, this patch casts user-specified `nullValue`s to `null`s for
all supported types including the string type:
- this fixes the problem where null date
Github user devmanhinton commented on the issue:
https://github.com/apache/spark/pull/14118
Just as a +1 would at least like the option to have `""` autocast to `null`
when read in from csv. Helpful for me in production given UDFs skip function
application when input is `null` but not
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63493/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #63493 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63493/consoleFull)**
for PR 14118 at commit
[`f58e33d`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #63493 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63493/consoleFull)**
for PR 14118 at commit
[`f58e33d`](https://github.com/apache/spark/commit/f
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14118
BTW, this problem exists in the external CSV data source as well. The root
cause of https://github.com/databricks/spark-csv/issues/370 is this issue and
also if my understanding is correct, the
Github user djk121 commented on the issue:
https://github.com/apache/spark/pull/14118
I'm doing this:
val dataframe = sparkSession.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("nullValue", "null")
.schema
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14118
You can specify "com.databricks.spark.csv" as the source.
On Fri, Aug 5, 2016 at 11:58 PM, djk121 wrote:
> Is there a way to fall back to the old databricks csv library in spark 2
Github user djk121 commented on the issue:
https://github.com/apache/spark/pull/14118
Is there a way to fall back to the old databricks csv library in spark 2.0
to work around this? Round-tripping worked there with .option("nullValue",
"null"), but I don't see a way to get round-tripp
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14118
This change looks reasonable to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
ena
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
@falaki could you take a look at the lasted update: [[bf01cea] StringType
should also respect
`nullValue`](https://github.com/apache/spark/pull/14118/commits/bf01cea8273f00386ceef6459f8b8fe2c169e12a)
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14118
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63260/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #63260 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63260/consoleFull)**
for PR 14118 at commit
[`bf01cea`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14118
**[Test build #63260 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63260/consoleFull)**
for PR 14118 at commit
[`bf01cea`](https://github.com/apache/spark/commit/b
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/14118
@lw-lin thanks a lot for the clear summary.
After seeing some use cases, I think it is better to apply nullValue to all
types, including `StringType`. `treatEmptyValuesAsNulls` seems a special ca
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
Here are some findings as I dug a little:
1. Since https://github.com/databricks/spark-csv/pull/102(Jul, 2015), we
would cast `""` as `null` for all types other than strings. For strings, `""
Github user deanchen commented on the issue:
https://github.com/apache/spark/pull/14118
Would be great to get a resolution to this. We're running into issues in
production attempting to parse csv's with nullable dates. Personally prefer
option b for our use case.
---
If your project
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/14118
I think @HyukjinKwon has made a good point: it's kind of strange null
strings can be written out, but can not be read back as nulls.
So for `StringType`:
nulls writ
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14118
IMHO, handling `StringType` at least lets users handling `null`s in
roundtrip in writing and reading. CSV writes `null` according to `nullValue`
[here](https://github.com/apache/spark/blob/38cf8
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14118
Thanks for the information. I'm still confused. From an end-user
perspective, do we need to handle StringType there?
---
If your project is set up for it, you can reply to this email and have your
re
64 matches
Mail list logo