Hi,

I have a standard csv file (saved as csv in HDFS) that has first line of
blank at the header
as follows

[blank line]
Date, Type, Description, Value, Balance, Account Name, Account Number
[blank line]
22/03/2011,SBT,"'FUNDS TRANSFER , FROM A/C 1790999",200.00,200.00,"'BROWN
AE","'638585-60125663",

When I read this file using the following standard

val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header",
"true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")

it crashes.

java.util.NoSuchElementException
        at java.util.ArrayList$Itr.next(ArrayList.java:794)

 If I go and manually delete the first blank line it works OK

val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header",
"true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")

df: org.apache.spark.sql.DataFrame = [Date: string,  Type: string,
Description: string,  Value: double,  Balance: double,  Account Name:
string,  Account Number: string]

I can easily write a shell script to get rid of blank line. I was wondering
if databricks does have a flag to get rid of the first blank line in csv
file format?

P.S. If the file is stored as DOS text file, this problem goes away.

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Reply via email to