Re: Indexing information on number of attachments and their names in EML file

2019-08-02 Thread Tim Allison
I'd strongly recommend rolling your own ingest code. See Erick's superb: https://lucidworks.com/post/indexing-with-solrj/ You can easily get attachments via the RecursiveParserWrapper, e.g. https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParse

Re: [ANNOUNCE] Apache Tika 1.22 released

2019-08-02 Thread Ken Krugler
Hi Tim, Thanks for pushing out yet-another release! — Ken > On Aug 2, 2019, at 4:28 AM, Tim Allison wrote: > > The Apache Tika project is pleased to announce the release of Apache Tika > 1.22. The release contents have been pushed out to the main Apache > release site and to the Maven Central

[CVE-2019-10094] StackOverflow from Crafted Package/Compressed Files in Apache Tika's RecursiveParserWrapper

2019-08-02 Thread Tim Allison
Title: [CVE-2019-10094] StackOverflow from Crafted Package/Compressed Files in Apache Tika's RecursiveParserWrapper Severity: Medium Vendor: The Apache Software Foundation Versions Affected: Apache Tika 1.7 to 1.21 Description: A carefully crafted package/compressed file that, when unzipped/un

[CVE-2019-10093] Denial of Service in Apache Tika's 2003ml and 2006ml Parsers

2019-08-02 Thread Tim Allison
Title: [CVE-2019-10093] Denial of Service in Apache Tika's 2003ml and 2006ml Parsers Severity: Medium Vendor: The Apache Software Foundation Versions Affected: Apache Tika 1.19 to 1.21 Description: A carefully crafted 2003ml or 2006ml file could consume all available SAXParsers in the pool and

[CVE-2019-10088] OOM from a crafted Zip File in Apache Tika's RecursiveParserWrapper

2019-08-02 Thread Tim Allison
Title: [CVE-2019-10088] OOM from a crafted Zip File in Apache Tika's RecursiveParserWrapper Severity: Medium Vendor: The Apache Software Foundation Versions Affected: Apache Tika 1.7 to 1.21 Description: A carefully crafted or corrupt zip file can cause an OOM in Apache Tika's RecursiveParserW

[ANNOUNCE] Apache Tika 1.22 released

2019-08-02 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika 1.22. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs. Apache Tika is a toolkit for detecting

Indexing information on number of attachments and their names in EML file

2019-08-02 Thread Zheng Lin Edwin Yeo
Hi, Would like to check, Is there anyway which we can detect the number of attachments and their names during indexing of EML files in Solr, and index those information into Solr? Currently, Solr is able to use Tika and Tesseract OCR to extract the contents of the attachments. However, I could no