I'd strongly recommend rolling your own ingest code. See Erick's
superb: https://lucidworks.com/post/indexing-with-solrj/
You can easily get attachments via the RecursiveParserWrapper, e.g.
https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParse
Hi Tim,
Thanks for pushing out yet-another release!
— Ken
> On Aug 2, 2019, at 4:28 AM, Tim Allison wrote:
>
> The Apache Tika project is pleased to announce the release of Apache Tika
> 1.22. The release contents have been pushed out to the main Apache
> release site and to the Maven Central
Title: [CVE-2019-10094] StackOverflow from Crafted Package/Compressed
Files in Apache Tika's RecursiveParserWrapper
Severity: Medium
Vendor: The Apache Software Foundation
Versions Affected: Apache Tika 1.7 to 1.21
Description:
A carefully crafted package/compressed file that, when
unzipped/un
Title: [CVE-2019-10093] Denial of Service in Apache Tika's 2003ml and
2006ml Parsers
Severity: Medium
Vendor: The Apache Software Foundation
Versions Affected: Apache Tika 1.19 to 1.21
Description:
A carefully crafted 2003ml or 2006ml file could consume all available
SAXParsers in the pool and
Title: [CVE-2019-10088] OOM from a crafted Zip File in Apache Tika's
RecursiveParserWrapper
Severity: Medium
Vendor: The Apache Software Foundation
Versions Affected: Apache Tika 1.7 to 1.21
Description:
A carefully crafted or corrupt zip file can cause an OOM in Apache
Tika's RecursiveParserW
The Apache Tika project is pleased to announce the release of Apache Tika
1.22. The release contents have been pushed out to the main Apache
release site and to the Maven Central sync, so the releases should be
available as soon as the mirrors get the syncs.
Apache Tika is a toolkit for detecting
Hi,
Would like to check, Is there anyway which we can detect the number of
attachments and their names during indexing of EML files in Solr, and index
those information into Solr?
Currently, Solr is able to use Tika and Tesseract OCR to extract the
contents of the attachments. However, I could no