[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-12-19 Thread bdrillard
Github user bdrillard commented on the issue: https://github.com/apache/spark/pull/19518 This PR was addressed by #19811, closing this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-12-19 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19518 @bdrillard can you close this pr? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19518 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-26 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 Good new is [the issue](https://github.com/janino-compiler/janino/issues/33) in janino has been quickly fixed. Bad new is no official date to release the next version now. ---

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-25 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 > I think ldc is 2 bytes and ldc_w is 3 bytes? You are right, thanks, updated. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19518 > 4 or 5 bytes: others (ldc or ldc_w) I think ldc is 2 bytes and ldc_w is 3 bytes? --- - To unsubscribe, e-mail:

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 In #19811, first, I will take a hybrid approach of outer (flat) global variable and arrays. The threshold for # of global variables would be configurable. I will give high priority to

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 I created and ran another synthetic benchmark program for comparing flat global variables, inner global variables, and array using janinoc for target Java file. The performance is not much

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 In int case, the followings are the number of java byte code length for a value 1 byte: -1, 0, 1, 2, 3, 4, 5 (iconst_?) 2 byte: -128 ~ -2, 6 ~ 127 (bipush) 3 bytes: -32768 ~ -129, 128

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19518 The PR looks good. Although it can solve the constant pool pressure, however, I have a question. Does it mean every time a constant falling in short range is used in codes, we increase 1

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 Here is [a PR](https://github.com/janino-compiler/janino/pull/34) to fix a problem regarding `sipush`. --- - To unsubscribe,

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 Next, I analyzed usage of constant pool entries and java bytecode ops using `janinoc`. The summary is as follows: ``` array[4] : 6 + 0 * n entries, 6-8 java bytecode ops / read access

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 First of all, I have to share sad news with you. `janino` does not use `sipush` for values from 128 to 32767. Current `janino` 3.0.7 uses `iconst...`, `bipush`, or `ldc`. `javac` uses `sipush` for

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19518 For the strategy, I'd like to give priority to primitive values to be in flat global variables. We also need to decide the priority between primitive types, according to which type has largest

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19518 @viirya in general there is no benefit. There will be a benefit if we manage to declare them where they are used, but I am not sure this is feasible. In this way, they do not add any entry to the

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19518 @mgaido91 Thanks. I looked at the constant pool you posted. It's clear. Any benefit to declare the variables in the inner classes? Looks like they still occupy constant pool entries? ---

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19518 @viirya if you take a look at the example I posted, you can see that we are not saving either `NameAndType` or `Fieldref`, thus think the only solution to save constant pool entries we have found

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19518 > When we use an inner class approach, we still require constant pool entry for accessing instance variables (e.g. `this.inner001.globalVar5) in one class. @kiszk But we still can save

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19518 Btw, can we config the maximum number of global variables? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19518 The hybrid approach sounds reasonable to me. Any special strategy to use to decide which fields are global variables and which are in array? ---

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 @mgaido91 Thank you very much. I did not know such a difference. I will validate with `janinoc` 3.0.0. --- - To unsubscribe,

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19518 @kiszk I meant that `janinoc` creates a slightly different constant pool from `javac`. I am not sure about performances, but the number of constant pool entries is definitely different. For

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19518 I like the latest @kiszk hybrid idea in terms of performance and readability. Also, this is a corner case, so I don't want affect most regular small queries. ---

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 @mgaido91 Thank you for your questions. 1. I am using `javac` as shown. I am sorry that I cannot understand what you are pointing out. In this benchmark, what are differences between `javac` and

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 Based on performance results and usage of constant pool entry, I would like to use hybrid approach with flat global variable and array. For example, first 500 variables are stored into flat

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 I created and ran another synthetic benchmark program for comparing flat global variables, inner global variables, and array. In summary, the followings are performance results (**small number is

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19518 can we use `32767` as array size upper bound? e.g. ``` class Foo { int[] globalVars1 = new int[32767]; int[] globalVars2 = new int[32767]; int[] globalVars3 = new

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19518 @kiszk thanks for your great analysis. May I have just a couple of additional questions? 1 - In all your tests, which compiler are you using? Because I see that you are linking to the Oracle

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 | what do you mean by this? using array can't reduce constant pool size at all? Not at all. However, when array index for an access is greater than 32768, the access requires constant pool entry.

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19518 > Each access to an array element requires one constant pool entry. what do you mean by this? using array can't reduce constant pool size at all? ---

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 @cloud-fan you are right, I am updating benchmark program and results. I realized that one issue of the array approach: **limitation of constant pool entries due to array access index**

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19518 You are comparing array vs member variables, can we compare array vs inner class member variable? And too many classes will have overhead on the classloader, we should test some extreme cases

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19518 I'd prefer inner class approach. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 @bdrillard @cloud-fan @maropu I created and run a benchmark program. I think that to use an array for a compaction is slower than to use scalar instance variables. In the following my case, 20%

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19518 @bdrillard thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread bdrillard
Github user bdrillard commented on the issue: https://github.com/apache/spark/pull/19518 Thanks for giving this the attention to shepard it on through. I haven't had the time to do the additional coding work necessary to properly benchmark it in the last few weeks. @kiszk, if there

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 OK, I will create a new PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19518 let's create a new PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-21 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 @cloud-fan Is it better to use this PR? Or, create a new PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-21 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19518 yea, ok @kiszk I'll review your work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-21 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 @cloud-fan I want to take this over if possible cc @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19518 @kiszk @maropu any of you wanna take this over? This patch becomes important as we now split codes more aggressively. --- -

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19518 ping @bdrillard --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-10 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19518 @bdrillard since my PR and other get merged now there are some conflicts, may you please fix them? Thanks. --- - To

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-19 Thread bdrillard
Github user bdrillard commented on the issue: https://github.com/apache/spark/pull/19518 @kiszk Ah, thanks for the link back to that discussion. I'll make modifications to the trials for better data. --- - To

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-19 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 @bdrillard I remember that we had the similar discussion about benchmarking. Could you see [this discussion](https://github.com/apache/spark/pull/16648#discussion_r118043056)? ---

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-19 Thread bdrillard
Github user bdrillard commented on the issue: https://github.com/apache/spark/pull/19518 @kiszk You are correct that the current implementation compacts all mutable state (where the state does not have to be _explicitly_ inlined). To your last question, I'd attempted some

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-19 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19518 Thank you for creating a PR for the latest Spark. I think that it is great to reduce # of constant pool entries. I have one high level comment. IIUC, this PR **always** perform mutable

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19518 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional