subject:"Tab delimited csv import and empty columns"

Re: Tab delimited csv import and empty columns

2020-08-05 Thread Stephen Coy

Hi Sean, German and others, Setting the “nullValue” option (for parsing CSV at least) seems to be an exercise in futility. When parsing the file, com.univocity.parsers.common.input.AbstractCharInputReader#getString contains the following logic: String out; if (len <= 0) { out =

Re: Tab delimited csv import and empty columns

2020-07-31 Thread Vladimir Ryzhov

Would *df.na.fill("") *do the trick? On Fri, Jul 31, 2020 at 8:43 AM Sean Owen wrote: > Try setting nullValue to anything besides the empty string. Because its > default is the empty string, empty strings become null by default. > > On Fri, Jul 31, 2020 at 3:20 AM Stephen Coy > wrote: > >>

Re: Tab delimited csv import and empty columns

2020-07-31 Thread Sean Owen

Try setting nullValue to anything besides the empty string. Because its default is the empty string, empty strings become null by default. On Fri, Jul 31, 2020 at 3:20 AM Stephen Coy wrote: > That does not work. > > This is Spark 3.0 by the way. > > I have been looking at the Spark unit tests

Re: Tab delimited csv import and empty columns

2020-07-31 Thread Stephen Coy

That does not work. This is Spark 3.0 by the way. I have been looking at the Spark unit tests and there does not seem to be any that load a CSV text file and verify that an empty string maps to an empty string which I think is supposed to be the default behaviour because the “nullValue”

Re: Tab delimited csv import and empty columns

2020-07-30 Thread German Schiavon Matteo

Hey, I understand that your empty values in your CSV are "" , if so, try this option: *.option("emptyValue", "\"\"")* Hope it helps On Thu, 30 Jul 2020 at 08:49, Stephen Coy wrote: > Hi there, > > I’m trying to import a tab delimited file with: > > Dataset catalogData = sparkSession >

Tab delimited csv import and empty columns

2020-07-30 Thread Stephen Coy

Hi there, I’m trying to import a tab delimited file with: Dataset catalogData = sparkSession .read() .option("sep", "\t") .option("header", "true") .csv(args[0]) .cache(); This works great, except for the fact that any column that is empty is given the value null, when I need these

Re: Tab delimited csv import and empty columns

Re: Tab delimited csv import and empty columns

Re: Tab delimited csv import and empty columns

Re: Tab delimited csv import and empty columns

Re: Tab delimited csv import and empty columns

Tab delimited csv import and empty columns

6 matches

Site Navigation

Mail list logo

Footer information