Add a new DataImportHandler EntityProcessor to handle non-XML files
-------------------------------------------------------------------
Key: SOLR-987
URL: https://issues.apache.org/jira/browse/SOLR-987
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: Nathan Adams
Need a way to use Data Import Handler to index non-XML (i.e. simple text) files
(either via HTTP or FileSystem)? This would assist in putting the entire
contents of a text file into a single field of a document for which the other
fields are being pulled out of another DataSource. An EntityProcessor looks
like the right place for this as it may help us add more attributes if needed.
We could also consider support for other file formats (PDF, office, etc.),
which may overlap with some of the Extraction/Tika work.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.