bq: How do I get a list of all valid field names based on the file type
bq: You don't. At least I've never found any. Plus various document formats
will allow custom meta-data fields so there's no definitive list.
It would be trivial to add field counts per mime to tika-eval. If you're
interes
Phillip - You may be interested to start with the example/files that ships with
Solr. It is specifically designed as a configuration (and UI!) that deals
with indexing rich files with a bit more than other examples - it pulls out
acronyms, e-mail addresses, and URLs from text, as well as what
bq: How do I get a list of all valid field names based on the file type
You don't. At least I've never found any. Plus various document
formats will allow custom meta-data fields so there's no definitive
list.
bq: Also how do I search the "free form" text for a word/pattern in
the Solr search too
Hi,
I'm starting out with Solr on a Windows box.
I want to index the following documents:
doc;docx
xls;xlsx
ppt
vsd
pdf
txt
gif;jpeg;tiff
I undersand that solr uses Apache Tika to read these file types and return an
xml stream back to Solr.
For Tika image processing, I've loaded Tesseract.