Hello,

I checked whether JONI can be used in RegexStringComparator.
After changing the engine of RegexStringComparator to JONI, when a regex
filter request was sent, the heap memory usage spiked and the RegionServer
did not work due to GC.

When I checked the reason, it is said that when using UTF8Encoding, an
infinite loop can occur if an invalid UTF8 is entered.[1]
For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.

After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in
RegexStringComparator, it was confirmed that the heap memory usage spike
was gone.[2]

In HBase, like trino, it seems to be necessary to use NonStrictUTF8Encoding
instead of UTF8Encoding for JoniRegexEngine's encoding.
What do you think about changing JoniRegexEngine's encoding to
NonStrictUTF8Encoding?

Best Regards,
Minwoo

On 2022/06/27 04:41:41 Minwoo Kang wrote:
> (I sent the mail title in Korean for the first time. I'm so sorry.)
>
> Hello,
>
> Recently, java.util.regex in the Regex filter (RegexStringComparator) had
> been running forever.
> It is said that java.util.regex can run forever or stack overflow in the
> worst case.
>
> Looking at RegexStringComparator, I saw that two regex implementations
> (java, joni) were provided.
> I was wondering if anyone has experience in changing the regex engine
> in RegexStringComparator to joni and operating it.
>
> Best Regards,
> Minwoo
>
> On 2022/06/27 04:37:11 Minwoo Kang wrote:
> > Hello,
> >
> > Recently, java.util.regex in the Regex filter (RegexStringComparator)
had
> > been running forever.
> > It is said that java.util.regex can run forever or stack overflow in the
> > worst case.
> >
> > Looking at RegexStringComparator, I saw that two regex implementations
> > (java, joni) were provided.
> > I was wondering if anyone has experience in changing the regex engine
> > in RegexStringComparator to joni and operating it.
> >
> > Best Regards,
> > Minwoo
> >
>

Reply via email to