Hi Aaron, unionAll is a workaround ...
* unionAll preserve duplicate v/s union that does not * SQL union and unionAll result in same output format i.e. another SQL v/s different RDD types here. * Understand the existing union contract issue. This may be a class hierarchy discussion for SchemaRDD, UnionRDD etc. ? Thanks, On Sun, Mar 30, 2014 at 11:08 AM, Aaron Davidson <ilike...@gmail.com> wrote: > Looks like there is a "unionAll" function on SchemaRDD which will do what > you want. The contract of RDD#union is unfortunately too general to allow > it to return a SchemaRDD without downcasting. > > > On Sun, Mar 30, 2014 at 7:56 AM, Manoj Samel <manojsamelt...@gmail.com>wrote: > >> Hi, >> >> I am trying SparkSQL based on the example on doc ... >> >> .... >> >> val people = >> sc.textFile("/data/spark/examples/src/main/resources/people.txt").map(_.split(",")).map(p >> => Person(p(0), p(1).trim.toInt)) >> >> >> val olderThanTeans = people.where('age > 19) >> val youngerThanTeans = people.where('age < 13) >> val nonTeans = youngerThanTeans.union(olderThanTeans) >> >> I can do a orderBy('age) on first two (which are SchemaRDD) but not on >> third. The nonTeans is a UnionRDD that does not supports orderBy. This >> seems different than the SQL behavior where results of 2 SQL unions is a >> SQL itself with same functionality ... >> >> Not clear why union of 2 SchemaRDDs does not produces a SchemaRDD .... >> >> >> Thanks, >> >> >> >