Re: RFR: 8148937: (str) Adapt StringJoiner for Compact Strings

Сергей Цыпанов Mon, 15 Mar 2021 06:56:45 -0700

On Mon, 15 Mar 2021 12:36:24 GMT, Claes Redestad <redes...@openjdk.org> wrote:


>> Hi @cl4es,
>>> Some of these changes conflict with #2334, which suggest removing the 
>>> `coder` and `isLatin1` methods from `String`.
>> 
>> I've checked out Aleksey's branch and applied my changes onto it, the only 
>> thing that I changed to make it work is replacing
>> public boolean isLatin1(String str) {
>>     return str.isLatin1();
>> }
>> with
>> public boolean isLatin1(String str) {
>>     return str.coder == String.LATIN1;
>> }
>> The rest of the code was left intact. `jdk:tier1` is OK after the change.
>>> As a more general point I think it would be good to explore options that 
>>> does not increase leakage of the implementation detail that `Strings` are 
>>> latin1- or utf16-encoded outside of java.lang.
>> 
>> Apart from `JavaLangAccess` the only thing that comes to my mind is 
>> reflection, but it will destroy all the improvement. Otherwise I cannot 
>> figure out any other way to access somehow package-private latin/non-latin 
>> functionality of `j.l.String` in `java.util` package. I wonder, whether I'm 
>> missing any other opportunities?
>
> A less intrusive alternative would be to use a `StringBuilder`, see changes 
> in this branch: 
> https://github.com/openjdk/jdk/compare/master...cl4es:stringjoin_improvement?expand=1
>  (I adapted your StringJoinerBenchmark to work with the ascii-only build 
> constraint)
> 
> This underperforms compared to your patch since StringBuilder.toString needs 
> to do a copy, but improves over the baseline:
> Benchmark                                                            (count)  
> (length)  (mode)  Mode  Cnt      Score      Error   Units
> StringJoinerBenchmark.stringJoiner                                       100  
>       64   latin  avgt    5   5420.701 ± 1433.485   ns/op
> StringJoinerBenchmark.stringJoiner:·gc.alloc.rate.norm                   100  
>       64   latin  avgt    5  20640.428 ±    0.130    B/op
> Patch:
> Benchmark                                                            (count)  
> (length)  (mode)  Mode  Cnt      Score      Error   Units
> StringJoinerBenchmark.stringJoiner                                       100  
>       64   latin  avgt    5   4271.401 ±  677.560   ns/op
> StringJoinerBenchmark.stringJoiner:·gc.alloc.rate.norm                   100  
>       64   latin  avgt    5  14136.294 ±    0.095    B/op
> 
> The comparative benefit is that we'd avoid punching more holes into String 
> implementation details for now. Not ruling that out indefinitely, but I think 
> it needs a stronger motivation than to improve StringJoiner alone.

I was thinking about `StringBuilder` at the very beginning but then decided to 
have no bounds checks and reallocations. Indeed from maintainability point of 
view your solution is much more attractive. I have one minor comment about 
`append()`: I think we should do more chaining, e.g. 
StringBuilder sb = new StringBuilder(len);
sb.append(elts[0]);
should be
StringBuilder sb = new StringBuilder(len).append(elts[0]);
etc. to have less bytecode.

E.g. if we take 
public String sb() {
  StringBuilder sb = new StringBuilder();
  
  sb.append("a");
  sb.append("b");
  
  return sb.toString();
}
`sb.append()` gets compiled into
L1
 LINENUMBER 23 L1
 ALOAD 1
 LDC "a"
 INVOKEVIRTUAL java/lang/StringBuilder.append 
(Ljava/lang/String;)Ljava/lang/StringBuilder;
 POP
L2
 LINENUMBER 24 L2
 ALOAD 1
 LDC "b"
 INVOKEVIRTUAL java/lang/StringBuilder.append 
(Ljava/lang/String;)Ljava/lang/StringBuilder;
 POP
 ```
 and in case we have
 ```
 sb.append("a").append("b");
 ```
 the amount of byte code is reduced:
 ```
 L1
 LINENUMBER 23 L1
 ALOAD 1
 LDC "a"
 INVOKEVIRTUAL java/lang/StringBuilder.append 
(Ljava/lang/String;)Ljava/lang/StringBuilder;
 LDC "b"
 INVOKEVIRTUAL java/lang/StringBuilder.append 
(Ljava/lang/String;)Ljava/lang/StringBuilder;
 POP
 ```
 
 Also I'd change
if (addLen == 0) {
    compactElts();
    return size == 0 ? "" : elts[0];
}
to
if (size == 0) {
    if (addLen == 0) {
        return "";
    }
    return prefix + suffix;
}
The second variant is more understandable to me and likely to be slightly 
faster.

And finally, should I close this PR and you'll create a new one from your 
branch, or I need to copy your changes here?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2627

Re: RFR: 8148937: (str) Adapt StringJoiner for Compact Strings

Reply via email to