Potentially you need to make two changes: 1. As Lewis suggested, make sure to change the content field in solr/conf/schema.xml as below: <field name="content" type="text" stored="true" indexed="true"/> 2. Append the following as a part of search url: &hl=on&hl.fl=content site url title OR Add the following to solrconfig.xml as a part of browse search component if you are using solr/browse: <str name="hl">on</str> <str name="hl.fl">url site title content</str>
You should be able to see something like this when you search in Solr: <lst name="highlighting"> <lst name="http://thetechietutorials.blogspot.com/"><arr name="content"><str>, June 15, 2011 A Custom <em>Solr</em> Search Component example - RedirectSearchComponent Currently Apache <em>Solr</em></str></arr></lst><lst name=" http://thetechietutorials.blogspot.com/2011/06/working-example-of-java-annotations.html"><arr name="content"><str>) ▼ June (5) A working example of Java Annotations A Custom <em>Solr</em> Search Component example - Redirect</str></arr></lst> ... </lst> You can also look at my blog about a customized solr browser interface for Nutch data if you are interested. Here is the url: http://thetechietutorials.blogspot.com/2011/07/customized-solr-browser-interface-for.html Thanks. On Wed, Aug 3, 2011 at 12:31 AM, Kiks <[email protected]> wrote: > This question was posted on solr list and not answered because nutch > related... > > > The indexed contents of 100 sites were imported to solr from nutch using: > > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb > crawl/segments/* > > now, a solr admin search for 'photography' includes these results: > > <doc> > <float name="score">0.12570743</ > float> > <float name="boost">1.0440307</float> > <str name="digest">94d97f2806240d18d67cafe9c34f94e1</str> > <str name="id">http://www.galleryhopper.org/</str> > <str name="segment">...</str> > <str name="title">Gallery Hopper: Todd Walker's photography ephemera. > Read, enjoy, share, discard.</str> > <date name="tstamp">...</date> > <str name="url">http://www.galleryhopper.org/</str> > </doc> > > but highlighting options are on the title field not page text. > > My question: Where is the stored parsetext content of the pages? What is > the > solr command to send it from nutch with url/id key? The information is > contained in the crawl segments with solr id field matching nutch url. > > Thanks. >

