Hi Sean, German and others,
Setting the “nullValue” option (for parsing CSV at least) seems to be an
exercise in futility.
When parsing the file,
com.univocity.parsers.common.input.AbstractCharInputReader#getString contains
the following logic:
String out;
if (len <= 0) {
out =
Would *df.na.fill("") *do the trick?
On Fri, Jul 31, 2020 at 8:43 AM Sean Owen wrote:
> Try setting nullValue to anything besides the empty string. Because its
> default is the empty string, empty strings become null by default.
>
> On Fri, Jul 31, 2020 at 3:20 AM Stephen Coy
> wrote:
>
>>
Try setting nullValue to anything besides the empty string. Because its
default is the empty string, empty strings become null by default.
On Fri, Jul 31, 2020 at 3:20 AM Stephen Coy
wrote:
> That does not work.
>
> This is Spark 3.0 by the way.
>
> I have been looking at the Spark unit tests
That does not work.
This is Spark 3.0 by the way.
I have been looking at the Spark unit tests and there does not seem to be any
that load a CSV text file and verify that an empty string maps to an empty
string which I think is supposed to be the default behaviour because the
“nullValue”
Hey,
I understand that your empty values in your CSV are "" , if so, try this
option:
*.option("emptyValue", "\"\"")*
Hope it helps
On Thu, 30 Jul 2020 at 08:49, Stephen Coy
wrote:
> Hi there,
>
> I’m trying to import a tab delimited file with:
>
> Dataset catalogData = sparkSession
>
Hi there,
I’m trying to import a tab delimited file with:
Dataset catalogData = sparkSession
.read()
.option("sep", "\t")
.option("header", "true")
.csv(args[0])
.cache();
This works great, except for the fact that any column that is empty is given
the value null, when I need these