Re: Detection problem with RFC822 file with HTML content

2015-11-14 Thread Vjeran Marcinko
Ok, here it is: https://issues.apache.org/jira/browse/TIKA-1793 -Vjeran On 13.11.2015 13:48, Nick Burch wrote: On Fri, 13 Nov 2015, Vjeran Marcinko wrote: On 13.11.2015 11:51, Nick Burch wrote: On Fri, 13 Nov 2015, Vjeran Marcinko wrote: I saved 2 .eml files saved by my Thunderbird, and one o

Re: Detection problem with RFC822 file with HTML content

2015-11-13 Thread Nick Burch
On Fri, 13 Nov 2015, Vjeran Marcinko wrote: On 13.11.2015 11:51, Nick Burch wrote: On Fri, 13 Nov 2015, Vjeran Marcinko wrote: I saved 2 .eml files saved by my Thunderbird, and one of them contained plain text content, whereas other one rich HTML content. Did you try with the latest version o

Re: Detection problem with RFC822 file with HTML content

2015-11-13 Thread Vjeran Marcinko
Yep, I'm using v1.11 On 13.11.2015 11:51, Nick Burch wrote: On Fri, 13 Nov 2015, Vjeran Marcinko wrote: I saved 2 .eml files saved by my Thunderbird, and one of them contained plain text content, whereas other one rich HTML content. Did you try with the latest version of Apache Tika? IIRC we

Re: Detection problem with RFC822 file with HTML content

2015-11-13 Thread Nick Burch
On Fri, 13 Nov 2015, Vjeran Marcinko wrote: I saved 2 .eml files saved by my Thunderbird, and one of them contained plain text content, whereas other one rich HTML content. Did you try with the latest version of Apache Tika? IIRC we did some fixes around this moderately recently Nick

Detection problem with RFC822 file with HTML content

2015-11-12 Thread Vjeran Marcinko
Hello, I saved 2 .eml files saved by my Thunderbird, and one of them contained plain text content, whereas other one rich HTML content. The plain text one got recognized by Tika as "message/rfc822" file, but the other one incorrectly as "text/html" (and textual content being incorrectly extr