The Tika integration with the DataImportHandler allows you to control
many aspects of what goes into the index, including solving this
problem:
http://wiki.apache.org/solr/TikaEntityProcessor
(Tika is the extraction library, and ExtractingRequestHandler and the
TikaEntityProcessor both use it.)
Hi list,
I'm using the ExtractingRequestHandler to extract content from
documents. It's extracting the "last_modified" field quite fine, but of
course only for documents where this field is set. If this field is not
set I want to pass the file system timestamp of the file.
I'm doing:
final Conte