Mariusz Cieślukowski created TIKA-3100: ------------------------------------------
Summary: RFC822Parser ignore charset when extractAllAlternatives set to true Key: TIKA-3100 URL: https://issues.apache.org/jira/browse/TIKA-3100 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.24.1 Environment: Windows 10 x64 OpenJDK 14 Reporter: Mariusz Cieślukowski Attachments: testRFC822_quoted_charset_iso_8859_2 In default mode RFC822Parser seems to ignore charset defined in headers when detect content. When I set "extractAllAlternatives " to false then content seems fine. Test case: {code:java} @Test public void testQuotedPrintableCharset() { Metadata metadata = new Metadata(); InputStream stream = getStream("test-documents/testRFC822_quoted_charset_iso_8859_2"); ContentHandler handler = new BodyContentHandler(); ParseContext context = new ParseContext(); try { RFC822Parser emailparser = new RFC822Parser(); emailparser.setExtractAllAlternatives(true); emailparser.parse(stream, handler, metadata, context); String bodyText = handler.toString(); assertTrue(bodyText.contains("Dzie\u0144 dobry.")); } catch (Exception e) { fail("Exception thrown: " + e.getMessage()); } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)