Re: Problem with detection of .mbox file

2016-07-25 Thread Vjeran Marcinko
Thanx a bunch for a suggested workaround. Also, I have checked and bug exists in latest 1.4 nightly build -Vjeran On Tue, Jul 26, 2016 at 2:22 AM, Luís Filipe Nassif wrote: > Hi, > > Based on https://en.wikipedia.org/wiki/Mbox, you can add the following entry > in

RE: Is Tika (especially CharsetDetector) considered thread-safe?

2016-07-25 Thread Allison, Timothy B.
With 1.13 and this code, I'm not able to see any problems with our handful of test files in our unit tests. Exactly what code are you using? How are you doing detection? @Test public void testMultiThreadedEncodingDetection() throws Exception { Path testDocs =

RE: Is Tika (especially CharsetDetector) considered thread-safe?

2016-07-25 Thread Allison, Timothy B.
Charset detection _should_ be thread safe. If you can help us track down the problem (unit test?), we need to fix this. Thank you for raising this. Best, Tim -Original Message- From: c.leitin...@lirum.at [mailto:c.leitin...@lirum.at] Sent: Monday, July 25, 2016 6:01 PM To:

Re: Problem with detection of .mbox file

2016-07-25 Thread Nick Burch
On Mon, 25 Jul 2016, Vjeran Marcinko wrote: I fist noticed that my .mbox file doesn't get parsed by MBoxParser, and later, after debugging Tika source code, I found what the problem is - default detector doesn't even recognize it as "applciation/mbox" MIME type, and although file extension is