Re: New Optimizer Hint

2017-05-01 Thread Josh Rosen
The issue of UDFS which return structs being evaluated many times when accessing the returned struct's fields sounds like https://issues.apache.org/jira/browse/SPARK-17728; that issue mentions a trick of using *array* and *explode* to prevent project collapsing. On Thu, Apr 20, 2017 at 8:55 AM

Re: New Optimizer Hint

2017-04-20 Thread Reynold Xin
Doesn't common sub expression elimination address this issue as well? On Thu, Apr 20, 2017 at 6:40 AM Herman van Hövell tot Westerflier < hvanhov...@databricks.com> wrote: > Hi Michael, > > This sounds like a good idea. Can you open a JIRA to track this? > > My initial feedback on your proposal

Re: New Optimizer Hint

2017-04-20 Thread Herman van Hövell tot Westerflier
Hi Michael, This sounds like a good idea. Can you open a JIRA to track this? My initial feedback on your proposal would be that you might want to express the no_collapse at the expression level and not at the plan level. HTH On Thu, Apr 20, 2017 at 3:31 PM, Michael Styles

New Optimizer Hint

2017-04-20 Thread Michael Styles
Hello, I am in the process of putting together a PR that introduces a new hint called NO_COLLAPSE. This hint is essentially identical to Oracle's NO_MERGE hint. Let me first give an example of why I am proposing this. df1 = sc.sql.createDataFrame([(1, "abc")], ["id", "user_agent"]) df2 =