Potentially you need to make two changes:
1. As Lewis suggested, make sure to change the content field in
solr/conf/schema.xml as below:
<field name="content" type="text" stored="true" indexed="true"/>
2. Append the following as a part of search url:
&hl=on&hl.fl=content site url title
OR
Add the following to solrconfig.xml as a part of browse search component if
you are using solr/browse:
 <str name="hl">on</str>
 <str name="hl.fl">url site title content</str>

You should be able to see something like this when you search in Solr:
<lst name="highlighting">
<lst name="http://thetechietutorials.blogspot.com/";><arr
name="content"><str>, June 15, 2011 A Custom <em>Solr</em> Search Component
example - RedirectSearchComponent Currently Apache
<em>Solr</em></str></arr></lst><lst name="
http://thetechietutorials.blogspot.com/2011/06/working-example-of-java-annotations.html";><arr
name="content"><str>) ▼  June (5) A working example of Java Annotations A
Custom <em>Solr</em> Search Component example - Redirect</str></arr></lst>
...
</lst>

You can also look at my blog about a customized solr browser interface for
Nutch data if you are interested. Here is the url:
http://thetechietutorials.blogspot.com/2011/07/customized-solr-browser-interface-for.html

Thanks.

On Wed, Aug 3, 2011 at 12:31 AM, Kiks <[email protected]> wrote:

> This question was posted on solr list and not answered because nutch
> related...
>
>
> The indexed contents of 100 sites were imported to solr from nutch using:
>
> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb
> crawl/segments/*
>
> now, a solr admin search for 'photography' includes these results:
>
>  <doc>
>    <float name="score">0.12570743</
> float>
>    <float name="boost">1.0440307</float>
>    <str name="digest">94d97f2806240d18d67cafe9c34f94e1</str>
>    <str name="id">http://www.galleryhopper.org/</str>
>    <str name="segment">...</str>
>    <str name="title">Gallery Hopper: Todd Walker's photography ephemera.
> Read, enjoy, share, discard.</str>
>    <date name="tstamp">...</date>
>    <str name="url">http://www.galleryhopper.org/</str>
>  </doc>
>
> but highlighting options are on the title field not page text.
>
> My question: Where is the stored parsetext content of the pages? What is
> the
> solr command to send it from nutch with url/id key? The information is
> contained in the crawl segments with solr id field matching nutch url.
>
> Thanks.
>

Reply via email to