Just i wrote on the Solr list. Use the index-more plugin or copyField the url 
to an extension field in which you can use char pattern replace filter to skip 
everything up to the first dot.

> Hello
> I want to search on articles via Solr. So need to find only specific files
> like doc, docx, and pdf.
> I don't need any html pages. Thus the result of our search should only
> consists of doc, docx, and pdf files.
> 
> I'm using Nutch to crawling web pages and sending Nutch's data to Solr for
> indexing. There is an approach to search on specific file types: Put the
> file extension into my index and I have no idea about the type of schema
> nutch uses when indexing into Solr, wether it creates a specific field for
> file extension, and/or how we can modify the nutch indexer to create a
> field like that for ourselves.

Reply via email to