Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-12 Thread Dirceu Semighini Filho
Hi Mathew, thanks for answering this, I've also tried with a simple case class and it works fine. I'm using this case class structure, which is failing: import java.text.SimpleDateFormat import java.util.Calendar import scala.annotation.tailrec trait TabbedToString { _: Product => override

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-09 Thread Matthew cao
Hi, I have tried simple test like this: case class A(id: Long) val sample = spark.range(0,10).as[A] sample.createOrReplaceTempView("sample") val df = spark.emptyDataset[A] val df1 = spark.sql("select * from sample").as[A] df.union(df1) It runs ok. And for nullabillity I thought that issue has

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Dirceu Semighini Filho
Ok, great, Well I havn't provided a good example of what I'm doing. Let's assume that my case class is case class A(tons of fields, with sub classes) val df = sqlContext.sql("select * from a").as[A] val df2 = spark.emptyDataset[A] df.union(df2) This code will throw the exception. Is this

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Burak Yavuz
Yes, unfortunately. This should actually be fixed, and the union's schema should have the less restrictive of the DataFrames. On Mon, May 8, 2017 at 12:46 PM, Dirceu Semighini Filho < dirceu.semigh...@gmail.com> wrote: > HI Burak, > By nullability you mean that if I have the exactly the same

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Dirceu Semighini Filho
HI Burak, By nullability you mean that if I have the exactly the same schema, but one side support null and the other doesn't, this exception (in union dataset) will be thrown? 2017-05-08 16:41 GMT-03:00 Burak Yavuz : > I also want to add that generally these may be caused by

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Burak Yavuz
I also want to add that generally these may be caused by the `nullability` field in the schema. On Mon, May 8, 2017 at 12:25 PM, Shixiong(Ryan) Zhu wrote: > This is because RDD.union doesn't check the schema, so you won't see the > problem unless you run RDD and hit

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Shixiong(Ryan) Zhu
This is because RDD.union doesn't check the schema, so you won't see the problem unless you run RDD and hit the incompatible column problem. For RDD, You may not see any error if you don't use the incompatible column. Dataset.union requires compatible schema. You can print ds.schema and

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Bruce Packer
> On May 8, 2017, at 11:07 AM, Dirceu Semighini Filho > wrote: > > Hello, > I've a very complex case class structure, with a lot of fields. > When I try to union two datasets of this class, it doesn't work with the > following error : > ds.union(ds1) > Exception in

Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Dirceu Semighini Filho
Hello, I've a very complex case class structure, with a lot of fields. When I try to union two datasets of this class, it doesn't work with the following error : ds.union(ds1) Exception in thread "main" org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the