bq: How do I get a list of all valid field names based on the file type You don't. At least I've never found any. Plus various document formats will allow custom meta-data fields so there's no definitive list.
bq: Also how do I search the "free form" text for a word/pattern in the Solr search tool? you put the extracted text (as opposed to meta-data) into an analyzed field and search that. NOTE: Solr is a search engine. The closest thing to an OOB "Solr Search Tool" is the admin UI, which isn't intended to be an end-user facing app. Here's some SolrJ code that'll let you explore the meta-data fields in various document types: https://lucidworks.com/2012/02/14/indexing-with-solrj/ You can pull out the RDBMS bits pretty easily. Best, Erick On Sun, Sep 24, 2017 at 7:55 PM, Phillip Wu <phillip...@unsw.edu.au> wrote: > > Hi, > I'm starting out with Solr on a Windows box. > > I want to index the following documents: > doc;docx > xls;xlsx > ppt > vsd > > pdf > txt > > gif;jpeg;tiff > > I undersand that solr uses Apache Tika to read these file types and return an > xml stream back to Solr. > For Tika image processing, I've loaded Tesseract. > > To be able to search the documents, I need to define "fields" in a file > called meta-schema. > > How do I get a list of all valid field names based on the file type? For > example *.doc, what "fields" exist so I choose what to store? > > I'm assuming that for example, *.doc files there is metadata put into the > file by Microsoft Word eg.author,date and "free form" text. > > So where is the list of valid fields per file type? > > Also how do I search the "free form" text for a word/pattern in the Solr > search tool? > > > >