[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 @michalsenkyr please create 2 more tickets for the optimization you metioned in https://github.com/apache/spark/pull/16240#issuecomment-266318016 and the nested custom collection problem. --- If

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wi

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70943/ Test PASSed. ---

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #70943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70943/testReport)** for PR 16240 at commit [`68810c4`](https://github.com/apache/spark/commit/6

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #70943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70943/testReport)** for PR 16240 at commit [`68810c4`](https://github.com/apache/spark/commit/68

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16240 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if th

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16240 For future reference: https://github.com/apache/spark/blob/master/dev/mima (script to run mima) --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Not sure how to run MiMa tests locally so I tried my best to figure out what was necessary. Hope this fixes it. The downside of the fix is that I had to restore the original methods in `SQL

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 you need to fix mima: ``` [error] * method newDoubleSeqEncoder()org.apache.spark.sql.Encoder in class org.apache.spark.sql.SQLImplicits does not have a correspondent in current version

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #70859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70859/testReport)** for PR 16240 at commit [`efd0801`](https://github.com/apache/spark/commit/e

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70859/ Test FAILed. ---

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #70859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70859/testReport)** for PR 16240 at commit [`efd0801`](https://github.com/apache/spark/commit/ef

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 LGTM, please create 2 more tickets for the optimization you metioned in https://github.com/apache/spark/pull/16240#issuecomment-266318016 and the nested custom collection. --- If your project is

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-22 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 I actually read that but IDEA complained when I tried to place the `Product` encoder into a separate trait. So I opted for specificity. However, I tried it again right now and even though ID

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 How about we assign priority to implicit rules like http://stackoverflow.com/questions/1886953/is-there-a-way-to-control-which-implicit-conversion-will-be-the-default-used ? I think we sh

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-20 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 None of them. The compilation will fail. That is why I had to provide those additional implicits. ``` scala> class Test[T] defined class Test scala> implicit def test1[

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 The overall strategy LGTM. > I had to alter and add new implicit encoders into SQLImplicits. The new encoders are for Seq with Product combination (essentially only List) to disambiguate

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-19 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16240 /cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-11 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Possible optimization: Instead of conversions using `to`, we can use `Builder`s. This way we could get rid of the conversion overhead. This would require adding a new codegen method that would

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-10 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Added support for arbitrary sequences. Now also Queues, ArrayBuffers and such can be used in datasets (all are serialized into ArrayType). I had to alter and add new implicit e

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #3488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3488/consoleFull)** for PR 16240 at commit [`8c15b47`](https://github.com/apache/spark/commit

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #3488 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3488/consoleFull)** for PR 16240 at commit [`8c15b47`](https://github.com/apache/spark/commit/

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-09 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 I would like to add that the conversion is specific to `List[_]`. I can add support for arbitrary sequence types through the use of `CanBuildFrom` if it is desirable. We can also suppo

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat