Please do file an issue on our issue tracker. https://issues.apache.org/jira . The project name is HBASE of course.
I think we may have bigger issues here because joni was recently flagged by static analysis tools we use at my employer to determine compliance with various government requirements. I would assume a CVE has been filed regarding joni. I plan to dig in here soon. A required upgrade of joni could by extension provoke an upgrade of JRuby. Sean, I recall you recently landed some changes in that regard, but only back to branch-2. So, if so, this encoding issue by comparison would be a smaller detail to also address concurrently. In any case let’s track the problem. > On Jul 16, 2022, at 10:43 AM, Sean Busbey <bus...@apache.org> wrote: > > That sounds reasonable. Could you file an issue in our issue tracker? Are > you up for working on a PR? > > >> On Wed, Jul 13, 2022 at 2:27 AM Minwoo Kang <its.minwoo.k...@gmail.com> >> wrote: >> >> Hello, >> >> I checked whether JONI can be used in RegexStringComparator. >> After changing the engine of RegexStringComparator to JONI, when a regex >> filter request was sent, the heap memory usage spiked and the RegionServer >> did not work due to GC. >> >> When I checked the reason, it is said that when using UTF8Encoding, an >> infinite loop can occur if an invalid UTF8 is entered.[1] >> For trino, using NonStrictUTF8Encoding instead of UTF8Encoding. >> >> After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in >> RegexStringComparator, it was confirmed that the heap memory usage spike >> was gone.[2] >> >> In HBase, like trino, it seems to be necessary to use NonStrictUTF8Encoding >> instead of UTF8Encoding for JoniRegexEngine's encoding. >> What do you think about changing JoniRegexEngine's encoding to >> NonStrictUTF8Encoding? >> >> Best Regards, >> Minwoo >> >>> On 2022/06/27 04:41:41 Minwoo Kang wrote: >>> (I sent the mail title in Korean for the first time. I'm so sorry.) >>> >>> Hello, >>> >>> Recently, java.util.regex in the Regex filter (RegexStringComparator) had >>> been running forever. >>> It is said that java.util.regex can run forever or stack overflow in the >>> worst case. >>> >>> Looking at RegexStringComparator, I saw that two regex implementations >>> (java, joni) were provided. >>> I was wondering if anyone has experience in changing the regex engine >>> in RegexStringComparator to joni and operating it. >>> >>> Best Regards, >>> Minwoo >>> >>> On 2022/06/27 04:37:11 Minwoo Kang wrote: >>>> Hello, >>>> >>>> Recently, java.util.regex in the Regex filter (RegexStringComparator) >> had >>>> been running forever. >>>> It is said that java.util.regex can run forever or stack overflow in >> the >>>> worst case. >>>> >>>> Looking at RegexStringComparator, I saw that two regex implementations >>>> (java, joni) were provided. >>>> I was wondering if anyone has experience in changing the regex engine >>>> in RegexStringComparator to joni and operating it. >>>> >>>> Best Regards, >>>> Minwoo >>>> >>> >>