Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
thanks for the review, merging to master/2.1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feat
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15807
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68436/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15807
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15807
**[Test build #68436 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68436/consoleFull)**
for PR 15807 at commit
[`3315d10`](https://github.com/apache/spark/commit/
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
@cloud-fan OK. I am looking at that issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
As the subexpression elimination problem may be hard to fix, I only adds
regression test in this PR, we can remove the `ReferenceToExpressions` later.
---
If your project is set up for it, you ca
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
As the subexpression elimination problem may be hard to fix, I only adds
regression test in this PR, we can remove the `ReferenceToExpressions` later.
---
If your project is set up for it, you ca
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15807
**[Test build #68436 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68436/consoleFull)**
for PR 15807 at commit
[`3315d10`](https://github.com/apache/spark/commit/3
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
@cloud-fan Looks good for now. I will take a look and give it a try
tomorrow.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
+1 on @kiszk 's idea, the next problem is, the sub expr eval method may
need local variables instead of input row `i`, but we can inline it at every
place which need the result of subexpression, e
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/15807
For wholestage codegen, I think that a life time of sub-expressions is
within an iteration for a row. Thus, `isInitialized` and `subExpr1` should be
initialized at the beginning of each iteration. For
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
even we modify it to hold the results of subexpressions in member
variables, the above code example should not work under wholestage codegen.
The above code example is similar to non wholesta
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
why whole stage codegen can't use member variables to keep the result of
subexpression?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as w
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
> isn't the result of subexpression kept in member variables?
For non-wholestage codegen, yes. For wholestage codegen, no.
---
If your project is set up for it, you can reply to this email a
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
isn't the result of subexpression kept in member variables? What I am
talking about is something like:
```
private boolean isInitialized = false;
private Int subExpr1 = 0;
priv
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
E.g.,
if (isNull(subexpr)) {
...
} else {
AssertNotNull(subexpr) // subexpr2, first used.
SomeExpr(AssertNotNull(subexpr)) // SomeExpr
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
> we can't access the subexpression outside later
I don't quite understand it, can you give an example?
---
If your project is set up for it, you can reply to this email and have your
rep
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
@cloud-fan Then once the first expression to use the subexpression is in a
if/else branch, we can't access the subexpression outside later. Evaluate it
again?
---
If your project is set up for it,
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
can we just evaluate subexpression like a scala lazy val?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not h
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/15807
@viirya @cloud-fan It looks reasonable to me that to skip subexpression
elimination for the expressions wrapped in condition expressions such as `if`.
This is because we have only a place at top level
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
@cloud-fan @kiszk I would propose to skip subexpression elimination for the
expressions wrapped in condition expressions such as `If`. What do you think?
---
If your project is set up for it, you ca
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
If we only evaluate the subexpressions before they are used. Wouldn't it
cause more than once evaluation?
E.g.,
if (isNull(subexpr)) {
...
} else {
As
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
`subexpr2` is a special case as it is an `AssertNotNull` which is a
`NonSQLExpression` and can throw an exception at runtime. If I understand
correctly, a SQL expression should not throw an exception
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/15807
In general, in addtion to bad performance, it may lead to an incorrect
result if `subexpr2` is always evaluated. When `subexpr` is null, to evaluate
`subexpr2` may throw an exception.
A compiler
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
@cloud-fan Yeah, looks like it is possibly the case. My first thought is
seems not hard to solve this. I will look at this tomorrow.
---
If your project is set up for it, you can reply to this email
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
@viirya sorry I forgot one line code in the example. When
`AsertNotNull(subexpr)` is also a subexpression, we will always evaluate it.
---
If your project is set up for it, you can reply to this
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15807
Not sure if this example is a real one or not. But looks like `subexpr` is
the subexpression, not `assertNotNull(subexpr)`. Why ``assertNotNull(subexpr)`
will be always evaluated?
---
If your proje
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
There is probably a bug of common subexpression elimination: we will
evalute all subexpressions at the very beginning, no matter the results of
subexpressions will be used or not. A counter exampl
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15807
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15807
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68334/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15807
**[Test build #68334 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68334/consoleFull)**
for PR 15807 at commit
[`0103bb4`](https://github.com/apache/spark/commit/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15807
Build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15807
**[Test build #68331 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68331/consoleFull)**
for PR 15807 at commit
[`3917630`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15807
**[Test build #68334 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68334/consoleFull)**
for PR 15807 at commit
[`0103bb4`](https://github.com/apache/spark/commit/0
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/15807
Sorry I haven't noticed that https://github.com/apache/spark/pull/15693 was
merged. Then this PR becomes a cleanup, not a bug fix. But I'd like to keep the
regression test as it's from another JIR
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15807
**[Test build #68331 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68331/consoleFull)**
for PR 15807 at commit
[`3917630`](https://github.com/apache/spark/commit/3
36 matches
Mail list logo