You can use the structtype and structfield approach or use the inferSchema 
approach.


Sent from my T-Mobile 4G LTE Device

-------- Original message --------
From: "write2sivakumar@gmail" <write2sivaku...@gmail.com> 
Date: 12/30/16  10:08 PM  (GMT-05:00) 
To: Raymond Xie <xie3208...@gmail.com>, user@spark.apache.org 
Subject: Re: How to load a big csv to dataframe in Spark 1.6 


    
Hi Raymond,
Your problem is to pass those 100 fields to .toDF() method??


Sent from my Samsung device

-------- Original message --------
From: Raymond Xie <xie3208...@gmail.com> 
Date: 31/12/2016  10:46  (GMT+08:00) 
To: user@spark.apache.org 
Subject: How to load a big csv to dataframe in Spark 1.6 

Hello,
I see there is usually this way to load a csv to dataframe:
sqlContext = SQLContext(sc)

Employee_rdd = sc.textFile("\..\Employee.csv")
               .map(lambda line: line.split(","))

Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])

Employee_df.show()However in my case my csv has 100+ fields, which means toDF() 
will be very lengthy.
Can anyone tell me a practical method to load the data?
Thank you very much.

Raymond





Reply via email to