subject:"Databricks fails to read the csv file with blank line at the file header"

Re: Databricks fails to read the csv file with blank line at the file header

2016-03-28 Thread Mich Talebzadeh

Hi, I am using Spark 1.6.1 with Hive 2. I agree this may be a case to be resolved. I just happened to work around it. That first blank line causes val df = sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", "true").option("header",

Re: Databricks fails to read the csv file with blank line at the file header

2016-03-28 Thread Hyukjin Kwon

Could I ask which version are you using? It looks the cause is the empty line right after header (because that case is not being checked in tests). However, for empty lines before the header or inside date, they are being tested.

Re: Databricks fails to read the csv file with blank line at the file header

2016-03-28 Thread Ashok Kumar

Hello Mich If you accommodate can you please share your approach to steps 1-3 above. Best regards On Sunday, 27 March 2016, 14:53, Mich Talebzadeh wrote: Pretty simple as usual it is a combination of ETL and ELT. Basically csv files are loaded into staging

Re: Databricks fails to read the csv file with blank line at the file header

2016-03-27 Thread Mich Talebzadeh

Pretty simple as usual it is a combination of ETL and ELT. Basically csv files are loaded into staging directory on host, compressed before pushing into hdfs 1. ETL --> Get rid of the header blank line on the csv files 2. ETL --> Compress the csv files 3. ETL --> Put the compressed CVF

Re: Databricks fails to read the csv file with blank line at the file header

2016-03-26 Thread Koert Kuipers

To me this is expected behavior that I would not want fixed, but if you look at the recent commits for spark-csv it has one that deals this... On Mar 26, 2016 21:25, "Mich Talebzadeh" wrote: > > Hi, > > I have a standard csv file (saved as csv in HDFS) that has first

Databricks fails to read the csv file with blank line at the file header

2016-03-26 Thread Mich Talebzadeh

Hi, I have a standard csv file (saved as csv in HDFS) that has first line of blank at the header as follows [blank line] Date, Type, Description, Value, Balance, Account Name, Account Number [blank line] 22/03/2011,SBT,"'FUNDS TRANSFER , FROM A/C 1790999",200.00,200.00,"'BROWN

Re: Databricks fails to read the csv file with blank line at the file header

Re: Databricks fails to read the csv file with blank line at the file header

Re: Databricks fails to read the csv file with blank line at the file header

Re: Databricks fails to read the csv file with blank line at the file header

Re: Databricks fails to read the csv file with blank line at the file header

Databricks fails to read the csv file with blank line at the file header

6 matches

Site Navigation

Mail list logo

Footer information