Hi, IndexingFilters and ParseFilters work on individual documents and cannot return a collection, which may be a good idea.
Assuming you have some sentence extractor at hand you could hack SolrWriter.java. Cheers, On Monday 31 October 2011 17:22:16 Michael Camilleri wrote: > Hi all, > > Is it possible to get Nutch to split the crawl results into sentences > so that each document contains only one sentence rather than a web > page? I need this so that when I use Solr to index the crawl db it > takes in a sentence at a time - the final result I want is to get a > list of sentences that match a query instead of a list of web pages > when doing a search. > > Thanks, > Michael -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

