Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Michael Allman
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Michael Allman
I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required

Re: New Optimizer Hint

2017-04-20 Thread Reynold Xin
Doesn't common sub expression elimination address this issue as well? On Thu, Apr 20, 2017 at 6:40 AM Herman van Hövell tot Westerflier < hvanhov...@databricks.com> wrote: > Hi Michael, > > This sounds like a good idea. Can you open a JIRA to track this? > > My initial feedback on your proposal

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Nicholas Chammas
Steve, I think you're a good person to ask about this. Is the below any cause for concern? Or did I perhaps test this incorrectly? Nick On Tue, Apr 18, 2017 at 11:50 PM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I had trouble starting up a shell with the AWS package loaded >

Re: New Optimizer Hint

2017-04-20 Thread Herman van Hövell tot Westerflier
Hi Michael, This sounds like a good idea. Can you open a JIRA to track this? My initial feedback on your proposal would be that you might want to express the no_collapse at the expression level and not at the plan level. HTH On Thu, Apr 20, 2017 at 3:31 PM, Michael Styles

New Optimizer Hint

2017-04-20 Thread Michael Styles
Hello, I am in the process of putting together a PR that introduces a new hint called NO_COLLAPSE. This hint is essentially identical to Oracle's NO_MERGE hint. Let me first give an example of why I am proposing this. df1 = sc.sql.createDataFrame([(1, "abc")], ["id", "user_agent"]) df2 =

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Denny Lee
+1 (non-binding) On Wed, Apr 19, 2017 at 9:23 PM Dong Joon Hyun wrote: > +1 > > I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3 > with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver > –Psparkr` > > At the end of R test, I saw `Had CRAN check