Re: How to serach on specific file types ?

Markus Jelsma Tue, 04 Oct 2011 09:57:05 -0700

> I indexed my data by using index-more plugin and added my required field
> (like content_type) to schema.xml
> Now how can i search on pdf files (a kind of content_types) using this new
> index? what query should i enter to have a search on pdf files?


This is a Solr specific question and depends on your fieldType defined in Solr 
for the content type field. Refer to the Solr manual or mailing list.

> 
> On Thu, Sep 29, 2011 at 9:33 AM, ahmad ajiloo <[email protected]>wrote:
> > How can I use the Index-more plugin? I'm new to Nutch and need your help
> > in detail !
> > thanks
> > 
> > 
> > On Wed, Sep 14, 2011 at 12:54 PM, Markus Jelsma <
> > 
> > [email protected]> wrote:
> >> Just i wrote on the Solr list. Use the index-more plugin or copyField
> >> the url
> >> to an extension field in which you can use char pattern replace filter
> >> to skip
> >> everything up to the first dot.
> >> 
> >> > Hello
> >> > I want to search on articles via Solr. So need to find only specific
> >> 
> >> files
> >> 
> >> > like doc, docx, and pdf.
> >> > I don't need any html pages. Thus the result of our search should only
> >> > consists of doc, docx, and pdf files.
> >> > 
> >> > I'm using Nutch to crawling web pages and sending Nutch's data to Solr
> >> 
> >> for
> >> 
> >> > indexing. There is an approach to search on specific file types: Put
> >> > the file extension into my index and I have no idea about the type of
> >> > schema nutch uses when indexing into Solr, wether it creates a
> >> > specific field
> >> 
> >> for
> >> 
> >> > file extension, and/or how we can modify the nutch indexer to create a
> >> > field like that for ourselves.

Re: How to serach on specific file types ?

Reply via email to