On Mon, 15 Mar 2021 12:36:24 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>> Hi @cl4es, >>> Some of these changes conflict with #2334, which suggest removing the >>> `coder` and `isLatin1` methods from `String`. >> >> I've checked out Aleksey's branch and applied my changes onto it, the only >> thing that I changed to make it work is replacing >> public boolean isLatin1(String str) { >> return str.isLatin1(); >> } >> with >> public boolean isLatin1(String str) { >> return str.coder == String.LATIN1; >> } >> The rest of the code was left intact. `jdk:tier1` is OK after the change. >>> As a more general point I think it would be good to explore options that >>> does not increase leakage of the implementation detail that `Strings` are >>> latin1- or utf16-encoded outside of java.lang. >> >> Apart from `JavaLangAccess` the only thing that comes to my mind is >> reflection, but it will destroy all the improvement. Otherwise I cannot >> figure out any other way to access somehow package-private latin/non-latin >> functionality of `j.l.String` in `java.util` package. I wonder, whether I'm >> missing any other opportunities? > > A less intrusive alternative would be to use a `StringBuilder`, see changes > in this branch: > https://github.com/openjdk/jdk/compare/master...cl4es:stringjoin_improvement?expand=1 > (I adapted your StringJoinerBenchmark to work with the ascii-only build > constraint) > > This underperforms compared to your patch since StringBuilder.toString needs > to do a copy, but improves over the baseline: > Benchmark (count) > (length) (mode) Mode Cnt Score Error Units > StringJoinerBenchmark.stringJoiner 100 > 64 latin avgt 5 5420.701 ± 1433.485 ns/op > StringJoinerBenchmark.stringJoiner:·gc.alloc.rate.norm 100 > 64 latin avgt 5 20640.428 ± 0.130 B/op > Patch: > Benchmark (count) > (length) (mode) Mode Cnt Score Error Units > StringJoinerBenchmark.stringJoiner 100 > 64 latin avgt 5 4271.401 ± 677.560 ns/op > StringJoinerBenchmark.stringJoiner:·gc.alloc.rate.norm 100 > 64 latin avgt 5 14136.294 ± 0.095 B/op > > The comparative benefit is that we'd avoid punching more holes into String > implementation details for now. Not ruling that out indefinitely, but I think > it needs a stronger motivation than to improve StringJoiner alone. I was thinking about `StringBuilder` at the very beginning but then decided to have no bounds checks and reallocations. Indeed from maintainability point of view your solution is much more attractive. I have one minor comment about `append()`: I think we should do more chaining, e.g. StringBuilder sb = new StringBuilder(len); sb.append(elts[0]); should be StringBuilder sb = new StringBuilder(len).append(elts[0]); etc. to have less bytecode. E.g. if we take public String sb() { StringBuilder sb = new StringBuilder(); sb.append("a"); sb.append("b"); return sb.toString(); } `sb.append()` gets compiled into L1 LINENUMBER 23 L1 ALOAD 1 LDC "a" INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder; POP L2 LINENUMBER 24 L2 ALOAD 1 LDC "b" INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder; POP ``` and in case we have ``` sb.append("a").append("b"); ``` the amount of byte code is reduced: ``` L1 LINENUMBER 23 L1 ALOAD 1 LDC "a" INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder; LDC "b" INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder; POP ``` Also I'd change if (addLen == 0) { compactElts(); return size == 0 ? "" : elts[0]; } to if (size == 0) { if (addLen == 0) { return ""; } return prefix + suffix; } The second variant is more understandable to me and likely to be slightly faster. And finally, should I close this PR and you'll create a new one from your branch, or I need to copy your changes here? ------------- PR: https://git.openjdk.java.net/jdk/pull/2627