Hi,
I am using Spark 1.6.1 with Hive 2.
I agree this may be a case to be resolved. I just happened to work around
it. That first blank line causes
val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header",
Could I ask which version are you using?
It looks the cause is the empty line right after header (because that case
is not being checked in tests).
However, for empty lines before the header or inside date, they are being
tested.
Hello Mich
If you accommodate can you please share your approach to steps 1-3 above.
Best regards
On Sunday, 27 March 2016, 14:53, Mich Talebzadeh
wrote:
Pretty simple as usual it is a combination of ETL and ELT.
Basically csv files are loaded into staging
Pretty simple as usual it is a combination of ETL and ELT.
Basically csv files are loaded into staging directory on host, compressed
before pushing into hdfs
1. ETL --> Get rid of the header blank line on the csv files
2. ETL --> Compress the csv files
3. ETL --> Put the compressed CVF
To me this is expected behavior that I would not want fixed, but if you
look at the recent commits for spark-csv it has one that deals this...
On Mar 26, 2016 21:25, "Mich Talebzadeh" wrote:
>
> Hi,
>
> I have a standard csv file (saved as csv in HDFS) that has first
Hi,
I have a standard csv file (saved as csv in HDFS) that has first line of
blank at the header
as follows
[blank line]
Date, Type, Description, Value, Balance, Account Name, Account Number
[blank line]
22/03/2011,SBT,"'FUNDS TRANSFER , FROM A/C 1790999",200.00,200.00,"'BROWN