Just i wrote on the Solr list. Use the index-more plugin or copyField the url to an extension field in which you can use char pattern replace filter to skip everything up to the first dot.
> Hello > I want to search on articles via Solr. So need to find only specific files > like doc, docx, and pdf. > I don't need any html pages. Thus the result of our search should only > consists of doc, docx, and pdf files. > > I'm using Nutch to crawling web pages and sending Nutch's data to Solr for > indexing. There is an approach to search on specific file types: Put the > file extension into my index and I have no idea about the type of schema > nutch uses when indexing into Solr, wether it creates a specific field for > file extension, and/or how we can modify the nutch indexer to create a > field like that for ourselves.

