Re: simple enrich uploaded binary documents with sha256 hashes

2018-05-26 Thread Tim Allison
+1 as always to Erick’s advice. DIH is only a PoC. We do have a DigestingParser in Tika, and when you combine that w the RecursiveParserWrapper, you can get digests not only of the main file but also on all embedded files/attachments...which can be pretty neat for some use cases. Operators are

Re: simple enrich uploaded binary documents with sha256 hashes

2018-05-25 Thread Erick Erickson
I'd consider using a separate Java program that uses Tika directly, or one of various services. Then you can assemble whatever you please before sending the doc to Solr. There are multiple reasons to recommend this, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/ There are other

simple enrich uploaded binary documents with sha256 hashes

2018-05-24 Thread Thomas Lustig
dear community, I would like to automatically add a sha256 filehash to a Document field after a binary file is posted to a ExtractingRequestHandler. First i thought, that the ExtractingRequestHandler has such a feature, but so far i did not find a configuration. It was mentioned that I should