Hi,

I am working in a project where Tika is getting used in a heavily
multi-threaded environment. Lately, there have been some issues where
character set detection in isolation gives plausible results, while
running it in parallel gives results that are way off.

The root cause has not yet been found, but within the team, there was
quite some finger-pointing towards Tika's thread-safety and lots of
FUD especially around org.apache.tika.parser.txt.CharsetDetector.

But it seems no one in our team reached out or cared to either bug
report or ask on the mailing list.

So just to get rid of the FUD: Is
org.apache.tika.parser.txt.CharsetDetector considered to be
thread-safe?
(Some bugs suggest that Tika cares about thread-safety, but I could
not find anything in the javadoc for CharsetDetector)

Thanks and Best regards,
Christian


P.S.: We're building a fresh, new CharSetDetector for each byte array
that should have the character set encoding detected. And only the
thread that created the CharSetDetector is using it.


P.P.S.: We're still using Tika 1.9.

Reply via email to