In addition to Markus' comments I would suggest that instead of getting your
users to search across specific fields, if you do not wish to store ANY html
documents then simply filter for this... This simplifies the process of
searching for your system users.

On Wed, Sep 14, 2011 at 10:24 AM, Markus Jelsma
<[email protected]>wrote:

> Just i wrote on the Solr list. Use the index-more plugin or copyField the
> url
> to an extension field in which you can use char pattern replace filter to
> skip
> everything up to the first dot.
>
> > Hello
> > I want to search on articles via Solr. So need to find only specific
> files
> > like doc, docx, and pdf.
> > I don't need any html pages. Thus the result of our search should only
> > consists of doc, docx, and pdf files.
> >
> > I'm using Nutch to crawling web pages and sending Nutch's data to Solr
> for
> > indexing. There is an approach to search on specific file types: Put the
> > file extension into my index and I have no idea about the type of schema
> > nutch uses when indexing into Solr, wether it creates a specific field
> for
> > file extension, and/or how we can modify the nutch indexer to create a
> > field like that for ourselves.
>



-- 
*Lewis*

Reply via email to