Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/19518
This PR was addressed by #19811, closing this one.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/19518
@bdrillard can you close this pr?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19518
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
Good new is [the
issue](https://github.com/janino-compiler/janino/issues/33) in janino has been
quickly fixed. Bad new is no official date to release the next version now.
---
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
> I think ldc is 2 bytes and ldc_w is 3 bytes?
You are right, thanks, updated.
---
-
To unsubscribe, e-mail:
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19518
> 4 or 5 bytes: others (ldc or ldc_w)
I think ldc is 2 bytes and ldc_w is 3 bytes?
---
-
To unsubscribe, e-mail:
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
In #19811, first, I will take a hybrid approach of outer (flat) global
variable and arrays. The threshold for # of global variables would be
configurable.
I will give high priority to
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
I created and ran another synthetic benchmark program for comparing flat
global variables, inner global variables, and array using janinoc for target
Java file.
The performance is not much
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
In int case, the followings are the number of java byte code length for a
value
1 byte: -1, 0, 1, 2, 3, 4, 5 (iconst_?)
2 byte: -128 ~ -2, 6 ~ 127 (bipush)
3 bytes: -32768 ~ -129, 128
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19518
The PR looks good.
Although it can solve the constant pool pressure, however, I have a
question. Does it mean every time a constant falling in short range is used in
codes, we increase 1
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
Here is [a PR](https://github.com/janino-compiler/janino/pull/34) to fix a
problem regarding `sipush`.
---
-
To unsubscribe,
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
Next, I analyzed usage of constant pool entries and java bytecode ops using
`janinoc`. The summary is as follows:
```
array[4] : 6 + 0 * n entries, 6-8 java bytecode ops / read access
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
First of all, I have to share sad news with you. `janino` does not use
`sipush` for values from 128 to 32767. Current `janino` 3.0.7 uses `iconst...`,
`bipush`, or `ldc`. `javac` uses `sipush` for
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19518
For the strategy, I'd like to give priority to primitive values to be in
flat global variables. We also need to decide the priority between primitive
types, according to which type has largest
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19518
@viirya in general there is no benefit. There will be a benefit if we
manage to declare them where they are used, but I am not sure this is feasible.
In this way, they do not add any entry to the
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19518
@mgaido91 Thanks. I looked at the constant pool you posted. It's clear.
Any benefit to declare the variables in the inner classes? Looks like they
still occupy constant pool entries?
---
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19518
@viirya if you take a look at the example I posted, you can see that we are
not saving either `NameAndType` or `Fieldref`, thus think the only solution to
save constant pool entries we have found
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19518
> When we use an inner class approach, we still require constant pool entry
for accessing instance variables (e.g. `this.inner001.globalVar5) in one
class.
@kiszk But we still can save
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19518
Btw, can we config the maximum number of global variables?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19518
The hybrid approach sounds reasonable to me. Any special strategy to use to
decide which fields are global variables and which are in array?
---
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
@mgaido91 Thank you very much. I did not know such a difference. I will
validate with `janinoc` 3.0.0.
---
-
To unsubscribe,
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19518
@kiszk I meant that `janinoc` creates a slightly different constant pool
from `javac`. I am not sure about performances, but the number of constant pool
entries is definitely different. For
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/19518
I like the latest @kiszk hybrid idea in terms of performance and
readability. Also, this is a corner case, so I don't want affect most regular
small queries.
---
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
@mgaido91 Thank you for your questions.
1. I am using `javac` as shown. I am sorry that I cannot understand what
you are pointing out. In this benchmark, what are differences between `javac`
and
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
Based on performance results and usage of constant pool entry, I would like
to use hybrid approach with flat global variable and array.
For example, first 500 variables are stored into flat
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
I created and ran another synthetic benchmark program for comparing flat
global variables, inner global variables, and array. In summary, the
followings are performance results (**small number is
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19518
can we use `32767` as array size upper bound? e.g.
```
class Foo {
int[] globalVars1 = new int[32767];
int[] globalVars2 = new int[32767];
int[] globalVars3 = new
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19518
@kiszk thanks for your great analysis. May I have just a couple of
additional questions?
1 - In all your tests, which compiler are you using? Because I see that
you are linking to the Oracle
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
| what do you mean by this? using array can't reduce constant pool size at
all?
Not at all. However, when array index for an access is greater than 32768,
the access requires constant pool entry.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19518
> Each access to an array element requires one constant pool entry.
what do you mean by this? using array can't reduce constant pool size at
all?
---
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
@cloud-fan you are right, I am updating benchmark program and results.
I realized that one issue of the array approach: **limitation of constant
pool entries due to array access index**
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19518
You are comparing array vs member variables, can we compare array vs inner
class member variable? And too many classes will have overhead on the
classloader, we should test some extreme cases
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19518
I'd prefer inner class approach.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
@bdrillard @cloud-fan @maropu
I created and run a benchmark program. I think that to use an array for a
compaction is slower than to use scalar instance variables. In the following my
case, 20%
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19518
@bdrillard thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/19518
Thanks for giving this the attention to shepard it on through. I haven't
had the time to do the additional coding work necessary to properly benchmark
it in the last few weeks. @kiszk, if there
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
OK, I will create a new PR
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19518
let's create a new PR
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
ping @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
@cloud-fan Is it better to use this PR? Or, create a new PR?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/19518
yea, ok @kiszk I'll review your work.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
@cloud-fan I want to take this over if possible
cc @maropu
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19518
@kiszk @maropu any of you wanna take this over? This patch becomes
important as we now split codes more aggressively.
---
-
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19518
ping @bdrillard
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19518
@bdrillard since my PR and other get merged now there are some conflicts,
may you please fix them? Thanks.
---
-
To
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/19518
@kiszk Ah, thanks for the link back to that discussion. I'll make
modifications to the trials for better data.
---
-
To
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
@bdrillard I remember that we had the similar discussion about
benchmarking. Could you see [this
discussion](https://github.com/apache/spark/pull/16648#discussion_r118043056)?
---
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/19518
@kiszk You are correct that the current implementation compacts all mutable
state (where the state does not have to be _explicitly_ inlined).
To your last question, I'd attempted some
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19518
Thank you for creating a PR for the latest Spark.
I think that it is great to reduce # of constant pool entries. I have one
high level comment.
IIUC, this PR **always** perform mutable
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19518
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
50 matches
Mail list logo