yes, I believe there should be a better way to handle my case.

~~~sent from my cell phone, sorry if there is any typo

2016年12月30日 下午10:09,"write2sivakumar@gmail" <write2sivaku...@gmail.com>写道:

Hi Raymond,

Your problem is to pass those 100 fields to .toDF() method??



Sent from my Samsung device


-------- Original message --------
From: Raymond Xie <xie3208...@gmail.com>
Date: 31/12/2016 10:46 (GMT+08:00)
To: user@spark.apache.org
Subject: How to load a big csv to dataframe in Spark 1.6

Hello,

I see there is usually this way to load a csv to dataframe:

sqlContext = SQLContext(sc)

Employee_rdd = sc.textFile("\..\Employee.csv")
               .map(lambda line: line.split(","))

Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])

Employee_df.show()

However in my case my csv has 100+ fields, which means toDF() will be very
lengthy.

Can anyone tell me a practical method to load the data?

Thank you very much.


*Raymond*

Reply via email to