Hi,

On Tue, Jul 26, 2016 at 02:17:13AM +0000, Allison, Timothy B. wrote:
> Exactly what code are you using?  How are you doing detection?

I see that you already have something working on TIKA-2041.

But for completeness' sake:
Our code is a bit convoluted.
It boils down to running the following piece of code in multiple
threads in parallel:

    private String getCharset(final byte[] raw) {
        CharsetDetector detector = new CharsetDetector();
        detector.setText(raw);
        CharsetMatch match = detector.detect();
        if (match == null) {
            return null;
        }
        return match.getName();
    }

`raw` is isolated per thread. So CharsetDetector does not have the
byte array changed underneath its feet.


Best regards,
Christian

Reply via email to