I am using Solr 1.4.0 as my index, Nutch 1.2 as my crawler and Drupal 6.x as
my interface. My objective is to increase my teaser/description in my search
results.

 

My obstacles are:

 

1.)    Does nutch pull the entire page when it crawls and store it? (If it
does, then I can re-index crawled documents and get more description into my
search results. That would be easy!)

2.)    Does nutch truncate the page? If so, I can't find out where so I can
modify it to get the character length I need.

 

I guess my biggest question is, does nutch pull and keep the entire crawled
page? If so, I know to look to Solr configuration to get my desired search
results.

Thanks

 

Eric

 

Reply via email to