Hello
I want to search on articles via Solr. So need to find only specific files
like doc, docx, and pdf.
I don't need any html pages. Thus the result of our search should only
consists of doc, docx, and pdf files.

I'm using Nutch to crawling web pages and sending Nutch's data to Solr for
indexing. There is an approach to search on specific file types: Put the
file extension into my index and I have no idea about the type of schema
nutch uses when indexing into Solr, wether it creates a specific field for
file extension, and/or how we can modify the nutch indexer to create a
field like that for ourselves.

Reply via email to