Cannot tell anything specific about separator as it’s not clear how you create schema from schemaString.
Regarding the second issue - that’s expected, because there is a Map there and you cannot provide more, than one value for the key. That’s why you see only the last “min” value. This is a javadoc for the agg() function, you can try it this way. /** * Aggregates on the entire [[DataFrame]] without groups. * {{ * // df.agg(...) is a shorthand for df.groupBy().agg(...) * df.agg(max($"age"), avg($"salary")) * df.groupBy().agg(max($"age"), avg($"salary")) * }} * @group dfops */ On 10 Aug 2015, at 09:36, Netwaver <wanglong_...@163.com> wrote: > Hi Spark experts, > I am now using Spark 1.4.1 and trying Spark SQL/DataFrame > API with text file in below format > id gender height > 1 M 180 > 2 F 167 > ... ... > But I meet issues as described below: > 1. In my test program, I specify the schema > programmatically, but when I use "|" as the separator in schema string, the > code run into below exception when being executed on the cluster(Standalone) > > When I use "," as the separator, everything works fine. > 2. In the code, when I use DataFrame.agg() function with > same column name is used for different statistics functions(max,min,avg) > val peopleDF = sqlCtx.createDataFrame(rowRDD, schema) > > peopleDF.filter(peopleDF("gender").equalTo("M")).agg(Map("height" -> > "avg","height" -> "max","height" -> "min")).show() > I just find only the last function's computation result > is shown(as below), Does this work as design in Spark? > > Hopefully I have described the "issue" clearly, and please > feel free to correct me if have done something wrong, thanks a lot. > > Eugene Morozov fathers...@list.ru