Thanks Felix, I will try it tomorrow

~~~sent from my cell phone, sorry if there is any typo

2016年12月30日 下午10:08,"Felix Cheung" <felixcheun...@hotmail.com>写道:

> Have you tried the spark-csv package?
>
> https://spark-packages.org/package/databricks/spark-csv
>
>
> ------------------------------
> *From:* Raymond Xie <xie3208...@gmail.com>
> *Sent:* Friday, December 30, 2016 6:46:11 PM
> *To:* user@spark.apache.org
> *Subject:* How to load a big csv to dataframe in Spark 1.6
>
> Hello,
>
> I see there is usually this way to load a csv to dataframe:
>
> sqlContext = SQLContext(sc)
>
> Employee_rdd = sc.textFile("\..\Employee.csv")
>                .map(lambda line: line.split(","))
>
> Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])
>
> Employee_df.show()
>
> However in my case my csv has 100+ fields, which means toDF() will be very
> lengthy.
>
> Can anyone tell me a practical method to load the data?
>
> Thank you very much.
>
>
> *Raymond*
>
>

Reply via email to