On Nov 10, 2010, at 12:19 PM, Eric Martin wrote: > I am using Solr 1.4.0 as my index, Nutch 1.2 as my crawler and Drupal 6.x as > my interface. My objective is to increase my teaser/description in my search > results. > > > > My obstacles are: > > > > 1.) Does nutch pull the entire page when it crawls and store it? (If it > does, then I can re-index crawled documents and get more description into my > search results. That would be easy!) > > 2.) Does nutch truncate the page? If so, I can't find out where so I can > modify it to get the character length I need. > >
You should look at http.content.length. If a document is longer than the value specified with that option, then nutch truncates the page. Also, make sure you store "content" if you want to access it later. > > I guess my biggest question is, does nutch pull and keep the entire crawled > page? If so, I know to look to Solr configuration to get my desired search > results. > > Thanks > > > > Eric > > >

