That explains it. Thanks Reynold. Justin
On Mon, Apr 13, 2015 at 11:26 PM, Reynold Xin <r...@databricks.com> wrote: > I think what happened was applying the narrowest possible type. Type > widening is required, and as a result, the narrowest type is string between > a string and an int. > > > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala#L144 > > > > On Tue, Apr 7, 2015 at 5:00 PM, Justin Yip <yipjus...@prediction.io> > wrote: > >> Hello, >> >> I am experimenting with DataFrame. I tried to construct two DataFrames >> with: >> 1. case class A(a: Int, b: String) >> scala> adf.printSchema() >> root >> |-- a: integer (nullable = false) >> |-- b: string (nullable = true) >> >> 2. case class B(a: String, c: Int) >> scala> bdf.printSchema() >> root >> |-- a: string (nullable = true) >> |-- c: integer (nullable = false) >> >> >> Then I unioned the these two DataFrame with the unionAll function, and I >> get the following schema. It is kind of a mixture of A and B. >> >> scala> val udf = adf.unionAll(bdf) >> scala> udf.printSchema() >> root >> |-- a: string (nullable = false) >> |-- b: string (nullable = true) >> >> The unionAll documentation says it behaves like the SQL UNION ALL >> function. However, unioning incompatible types is not well defined for SQL. >> Is there any expected behavior for unioning incompatible data frames? >> >> Thanks. >> >> Justin >> > >