Hi,

IndexingFilters and ParseFilters work on individual documents and cannot 
return a collection, which may be a good idea.

Assuming you have some sentence extractor at hand you could hack 
SolrWriter.java.

Cheers,


On Monday 31 October 2011 17:22:16 Michael Camilleri wrote:
> Hi all,
> 
> Is it possible to get Nutch to split the crawl results into sentences
> so that each document contains only one sentence rather than a web
> page? I need this so that when I use Solr to index the crawl db it
> takes in a sentence at a time - the final result I want is to get a
> list of sentences that match a query instead of a list of web pages
> when doing a search.
> 
> Thanks,
> Michael

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to