Re: Problem with detection of .mbox file

2016-07-25 Thread Vjeran Marcinko
Thanx a bunch for a suggested workaround. Also, I have checked and bug exists in latest 1.4 nightly build -Vjeran On Tue, Jul 26, 2016 at 2:22 AM, Luís Filipe Nassif wrote: > Hi, > > Based on https://en.wikipedia.org/wiki/Mbox, you can add the following entry > in org/apache/tika/mime/custom-mi

RE: Is Tika (especially CharsetDetector) considered thread-safe?

2016-07-25 Thread Allison, Timothy B.
Christian, If you could open an account on JIRA, it would be helpful for discussion on this issue. Thank you, again. Best, Tim -Original Message- From: c.leitin...@lirum.at [mailto:c.leitin...@lirum.at] Sent: Monday, July 25, 2016 6:01 PM To: user@tik

RE: Is Tika (especially CharsetDetector) considered thread-safe?

2016-07-25 Thread Allison, Timothy B.
Y, we have a problem. Thank you for raising this. https://issues.apache.org/jira/browse/TIKA-2041

RE: Is Tika (especially CharsetDetector) considered thread-safe?

2016-07-25 Thread Allison, Timothy B.
Still couldn't find any problems with actual multithreaded code. :( @Test public void testMultiThreadingEncodingDetection() throws Exception { Path testDocs = Paths.get(this.getClass().getResource("/test-documents").toURI()); List paths = new ArrayList<>(); Map e

RE: Is Tika (especially CharsetDetector) considered thread-safe?

2016-07-25 Thread Allison, Timothy B.
With 1.13 and this code, I'm not able to see any problems with our handful of test files in our unit tests. Exactly what code are you using? How are you doing detection? @Test public void testMultiThreadedEncodingDetection() throws Exception { Path testDocs = Paths.get(this.

RE: Is Tika (especially CharsetDetector) considered thread-safe?

2016-07-25 Thread Allison, Timothy B.
Charset detection _should_ be thread safe. If you can help us track down the problem (unit test?), we need to fix this. Thank you for raising this. Best, Tim -Original Message- From: c.leitin...@lirum.at [mailto:c.leitin...@lirum.at] Sent: Monday, July 25, 2016 6:01 PM To: u

Re: Problem with detection of .mbox file

2016-07-25 Thread Luís Filipe Nassif
Hi, Based on https://en.wikipedia.org/wiki/Mbox, you can add the following entry in org/apache/tika/mime/custom-mimetypes.xml: The priority must be greater than message/rfc822. It sometimes returns false positives, but detects mbox files without exte

Is Tika (especially CharsetDetector) considered thread-safe?

2016-07-25 Thread c . leitinger
Hi, I am working in a project where Tika is getting used in a heavily multi-threaded environment. Lately, there have been some issues where character set detection in isolation gives plausible results, while running it in parallel gives results that are way off. The root cause has not yet been fo

RE: Problem with detection of .mbox file

2016-07-25 Thread Allison, Timothy B.
apache.snapshots Apache Development Snapshot Repository https://repository.apache.org/content/repositories/snapshots/ false true -Original Messag

Re: Problem with detection of .mbox file

2016-07-25 Thread Vjeran Marcinko
Thanx guys, I can do it in some clumsy way, but before I try it, is there some maven repo for such nightly builds that I can include and specify these 1.4-SNAPSHOT deps ? On Mon, Jul 25, 2016 at 9:14 PM, Allison, Timothy B. wrote: >> Can you try with a recent Tika nightly build? > e.g. > https:/

RE: Problem with detection of .mbox file

2016-07-25 Thread Allison, Timothy B.
> Can you try with a recent Tika nightly build? e.g. https://builds.apache.org/job/Tika-trunk/lastBuild/org.apache.tika$tika-app/ -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Monday, July 25, 2016 3:03 PM To: user@tika.apache.org Subject: Re: Problem with detec

Re: Problem with detection of .mbox file

2016-07-25 Thread Nick Burch
On Mon, 25 Jul 2016, Vjeran Marcinko wrote: I fist noticed that my .mbox file doesn't get parsed by MBoxParser, and later, after debugging Tika source code, I found what the problem is - default detector doesn't even recognize it as "applciation/mbox" MIME type, and although file extension is .mb

Problem with detection of .mbox file

2016-07-25 Thread Vjeran Marcinko
Hello, I fist noticed that my .mbox file doesn't get parsed by MBoxParser, and later, after debugging Tika source code, I found what the problem is - default detector doesn't even recognize it as "applciation/mbox" MIME type, and although file extension is .mbox, it ignores this hint because its "