bq: How do I get a list of all valid field names based on the file type

You don't. At least I've never found any. Plus various document
formats will allow custom meta-data fields so there's no definitive
list.

bq: Also how do I search the "free form" text for a word/pattern in
the Solr search tool?

you put the extracted text (as opposed to meta-data) into an analyzed
field and search that.



NOTE: Solr is a search engine. The closest thing to an OOB "Solr
Search Tool" is the admin UI, which isn't intended to be an end-user
facing app.

Here's some SolrJ code that'll let you explore the meta-data fields in
various document types:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

You can pull out the RDBMS bits pretty easily.

Best,
Erick

On Sun, Sep 24, 2017 at 7:55 PM, Phillip Wu <phillip...@unsw.edu.au> wrote:
>
>  Hi,
> I'm starting out with Solr on a Windows box.
>
> I want to index the following documents:
> doc;docx
> xls;xlsx
> ppt
> vsd
>
> pdf
> txt
>
> gif;jpeg;tiff
>
> I undersand that solr uses Apache Tika to read these file types and return an 
> xml stream back to Solr.
> For Tika image processing, I've loaded Tesseract.
>
> To be able to search the documents, I need to define "fields" in a file 
> called meta-schema.
>
> How do I get a list of all valid field names based on the file type? For 
> example *.doc, what "fields" exist so I choose what to store?
>
> I'm assuming that for example, *.doc files there is metadata put into the 
> file by Microsoft Word eg.author,date and "free form" text.
>
> So where is the list of valid fields per file type?
>
> Also how do I search the "free form" text for a word/pattern in the Solr 
> search tool?
>
>
>
>

Reply via email to