[jira] [Commented] (TIKA-696) Extract watermarks from Word documents

2024-05-16 Thread Alexey Pismenskiy (Jira)
[ https://issues.apache.org/jira/browse/TIKA-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847018#comment-17847018 ] Alexey Pismenskiy commented on TIKA-696: Hey [~nick] , we would be interested in th

[jira] [Updated] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4256: -- Description: For legacy tika, we're inlining all content from embedded files including ocr content of e

[jira] [Updated] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4256: -- Description: For legacy tika, we're inlining all content from embedded files including ocr content of e

[jira] [Created] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-4256: - Summary: Allow inlining of ocr'd text in container document Key: TIKA-4256 URL: https://issues.apache.org/jira/browse/TIKA-4256 Project: Tika Issue Type: Task

[PR] TIKA-4255: text-parser uses Metadata.CONTENT_ENCODING [tika]

2024-05-16 Thread via GitHub
axeld opened a new pull request, #1761: URL: https://github.com/apache/tika/pull/1761 If CSVParams.getCharset() is null, the passed in encoding is used before trying to auto detect it. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[jira] [Commented] (TIKA-4255) TextAndCSVParser ignores Metadata.CONTENT_ENCODING

2024-05-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846908#comment-17846908 ] ASF GitHub Bot commented on TIKA-4255: -- axeld opened a new pull request, #1761: URL:

[jira] [Created] (TIKA-4255) TextAndCSVParser ignores Metadata.CONTENT_ENCODING

2024-05-16 Thread Jira
Axel Dörfler created TIKA-4255: -- Summary: TextAndCSVParser ignores Metadata.CONTENT_ENCODING Key: TIKA-4255 URL: https://issues.apache.org/jira/browse/TIKA-4255 Project: Tika Issue Type: Bug