I indexed my data by using index-more plugin and added my required field (like content_type) to schema.xml Now how can i search on pdf files (a kind of content_types) using this new index? what query should i enter to have a search on pdf files?
On Thu, Sep 29, 2011 at 9:33 AM, ahmad ajiloo <[email protected]>wrote: > How can I use the Index-more plugin? I'm new to Nutch and need your help in > detail ! > thanks > > > On Wed, Sep 14, 2011 at 12:54 PM, Markus Jelsma < > [email protected]> wrote: > >> Just i wrote on the Solr list. Use the index-more plugin or copyField the >> url >> to an extension field in which you can use char pattern replace filter to >> skip >> everything up to the first dot. >> >> > Hello >> > I want to search on articles via Solr. So need to find only specific >> files >> > like doc, docx, and pdf. >> > I don't need any html pages. Thus the result of our search should only >> > consists of doc, docx, and pdf files. >> > >> > I'm using Nutch to crawling web pages and sending Nutch's data to Solr >> for >> > indexing. There is an approach to search on specific file types: Put the >> > file extension into my index and I have no idea about the type of schema >> > nutch uses when indexing into Solr, wether it creates a specific field >> for >> > file extension, and/or how we can modify the nutch indexer to create a >> > field like that for ourselves. >> > >

