IBM420 charset detection's isLamAlef is allocation-happy
--------------------------------------------------------

                 Key: TIKA-529
                 URL: https://issues.apache.org/jira/browse/TIKA-529
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.8
            Reporter: Radek
            Priority: Minor


Two IBM420 charset detectors (rtl and ltr) run isLamAlef() for each byte of 
detection buffer.

The code is allocating and filling a bytes array every time it runs, which 
makes it responsible for approximately 70% of all object allocations in my 
current test case (many text files).

Since array is identical every time, and the entire thing can be achieved 
without any array, this is wasteful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to