ext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original Message
> >
> > > From: Israel Ekpo
> > > To: solr-user@lucene.apache.org; u...@nutch.apache.org
> > > Sent: Mon, Oc
; u...@nutch.apache.org
> > Sent: Mon, October 18, 2010 9:01:50 PM
> > Subject: Removing Common Web Page Header and Footer from All Content
> > Fetched by
> >
> >Nutch
> >
> > Hi All,
> >
> > I am indexing a web application with approximately 9
From: Israel Ekpo
> To: solr-user@lucene.apache.org; u...@nutch.apache.org
> Sent: Mon, October 18, 2010 9:01:50 PM
> Subject: Removing Common Web Page Header and Footer from All Content Fetched
> by
>Nutch
>
> Hi All,
>
> I am indexing a web application with approxim
Hi All,
I am indexing a web application with approximately 9500 distinct URL and
contents using Nutch and Solr.
I use Nutch to fetch the urls, links and the crawl the entire web
application to extract all the content for all pages.
Then I run the solrindex command to send the content to Solr.
T