It is a space separated data, just as below
And What is your thought about the second issue? Thank you. At 2015-08-10 15:20:39, "Akhil Das" <ak...@sigmoidanalytics.com> wrote: Isnt it a space separated data? It is not a comma(,) separated nor pipe (|) separated data. Thanks Best Regards On Mon, Aug 10, 2015 at 12:06 PM, Netwaver <wanglong_...@163.com> wrote: Hi Spark experts, I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API with text file in below format id gender height 1 M 180 2 F 167 ... ... But I meet issues as described below: 1. In my test program, I specify the schema programmatically, but when I use "|" as the separator in schema string, the code run into below exception when being executed on the cluster(Standalone) When I use "," as the separator, everything works fine. 2. In the code, when I use DataFrame.agg() function with same column name is used for different statistics functions(max,min,avg) valpeopleDF = sqlCtx.createDataFrame(rowRDD, schema) peopleDF.filter(peopleDF("gender").equalTo("M")).agg(Map("height" -> "avg","height" -> "max","height" -> "min")).show() I just find only the last function's computation result is shown(as below), Does this work as design in Spark? Hopefully I have described the "issue" clearly, and please feel free to correct me if have done something wrong, thanks a lot.