Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-05-06 Thread Claes Redestad
Looks ok to me. /Claes On 2019-05-06 21:42, Ivan Gerasimov wrote: Hello everyone! I'm seeking a (capital R) Reviewer to review/approve the fix, so I could go ahead and push it. The latest webrev contains an additional testcase for OOM suggested by Tagir. BUGURL: https://bugs.openjdk.java

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-05-06 Thread Ivan Gerasimov
Thank you Claes! On 5/6/19 12:53 PM, Claes Redestad wrote: Looks ok to me. /Claes On 2019-05-06 21:42, Ivan Gerasimov wrote: Hello everyone! I'm seeking a (capital R) Reviewer to review/approve the fix, so I could go ahead and push it. The latest webrev contains an additional testcase fo

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-05-06 Thread Ivan Gerasimov
Hello everyone! I'm seeking a (capital R) Reviewer to review/approve the fix, so I could go ahead and push it. The latest webrev contains an additional testcase for OOM suggested by Tagir. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8222955 WEBREV: http://cr.openjdk.java.net/~igerasim/

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-30 Thread Ivan Gerasimov
Hello Tagir! On 4/30/19 8:57 AM, Tagir Valeev wrote: Hello! New webrev looks good. Probably it worth adding a testcase for overflow (assert that OOME is thrown)? I guess, something like "1".repeat(65536).replace("1", "1".repeat(65536)) would produce OOME quite fast and without allocating much

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-30 Thread Tagir Valeev
Hello! New webrev looks good. Probably it worth adding a testcase for overflow (assert that OOME is thrown)? I guess, something like "1".repeat(65536).replace("1", "1".repeat(65536)) would produce OOME quite fast and without allocating much memory. With best regards, Tagir Valeev. On Tue, Apr 30

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-30 Thread Peter Levart
On 4/30/19 6:43 AM, Ivan Gerasimov wrote: I used StringConcatHelper.newArray() to avoid bringing Unsafe into StringLatin1. In the StringLatin1.replace(), the newly allocated array is guaranteed to be filled up, and the filling code should never throw, so I believe using uninitialized arrays

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-29 Thread Ivan Gerasimov
Hello everyone! Please help review the second version of the enhancement. A separate branch was added to handle UTF16 in the searched string and/or in the target and replacement. Switching to Math.xxxExact() suggested by Tagir gave ~4% of throughput on affected test cases. Also, allocating

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-29 Thread Ivan Gerasimov
Hi Claes! On 4/25/19 4:51 AM, Claes Redestad wrote: Hi Ivan, the changes and fast-path speed-ups look reasonable, but I'd like to see that the overhead for cases not helped by the fast paths isn't excessive, e.g., micro variants with UTF16 Strings and a mix of Latin1 and UTF16. Good point! A

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-29 Thread Ivan Gerasimov
Thank you Tagir for suggestion! On 4/26/19 1:24 AM, Tagir Valeev wrote: Hello! Great optimization! +// overflow-conscious code +int resultLen = valLen + (++p) * deltaLen; +if (Integer.MAX_VALUE / p < deltaLen || +Integer.MAX_VALUE - resultLen < 0) { +

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-26 Thread Tagir Valeev
Hello! Great optimization! +// overflow-conscious code +int resultLen = valLen + (++p) * deltaLen; +if (Integer.MAX_VALUE / p < deltaLen || +Integer.MAX_VALUE - resultLen < 0) { +throw new OutOfMemoryError(); +} Is it really necessary t

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-25 Thread Сергей Цыпанов
My assumption is based on conclusions drawn from this article: https://shipilev.net/blog/2016/arrays-wisdom-ancients/ I don't think the situation has changed much since then. However, I'll do separate benchmarking with the latest JDK just in case. 25.04.2019, 11:24, "Ivan Gerasimov" : > Hi Serg

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-25 Thread Claes Redestad
Hi Ivan, the changes and fast-path speed-ups look reasonable, but I'd like to see that the overhead for cases not helped by the fast paths isn't excessive, e.g., micro variants with UTF16 Strings and a mix of Latin1 and UTF16. /Claes On 2019-04-25 08:00, Ivan Gerasimov wrote: Hello! This enha

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-25 Thread Ivan Gerasimov
Hi Sergei! Thanks for sharing. However, it's not immediately obvious to me why the later is better than the former. Can you please elaborate on that? I think, it worth a separate discussion. With kind regards, Ivan On 4/25/19 12:40 AM, Сергей Цыпанов wrote: Hi Ivan, recently looking in

Re: RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-25 Thread Сергей Цыпанов
Hi Ivan, recently looking into java.lang.String I've found out that String::split still uses old collection-to-array approach: String[] result = new String[resultSize]; return list.subList(0, resultSize).toArray(result); which should be replaced with return list.subList(0, resultSize).toArray

RFR 8222955 : Optimize String.replace(CharSequence, CharSequence) for Latin1 encoded strings

2019-04-24 Thread Ivan Gerasimov
Hello! This enhancement was inspired by a recent discussion at compiler-...@openjdk.java.net. It seems to be a non-uncommon situation when String.replace(CS, CS) is called with this and both arguments being Latin1 strings, so it seems reasonable to optimize for such case. Here are the fres