the page
> if page becomes
> unavailable for some number of fetch attempts.
> Regards
> Piotr
>
> On 8/10/05, Raymond Creel <[EMAIL PROTECTED]>
> wrote:
> > I have a question about the webdb and fetching.
> When
> > a page that used to have incoming links i
I have a question about the webdb and fetching. When
a page that used to have incoming links is found to be
"orphaned" (i.e. there are no longer any pages that
have links to it), is it deleted from the webdb? Or
is it left in the webdb but set not to be refetched?
Or will it continue to be refet
lp
raymond
--- [EMAIL PROTECTED] wrote:
> I think Nutch is behaving correctly.
> Maybe that page has a BASE URL (view source, look at
> the HEAD elements)
> that throws off one or the other.
>
> Otis
>
>
> --- Raymond Creel <[EMAIL PROTECTED]> wrote:
>
> &g
> What website are you working on?
Many different ones, each with their own nutch
configurations, which is why I'm trying to figure out
how to tweak the fetcher so it maximizes speed while
minimizing errors and webmaster annoyance. :)
Currently it seems to be working pretty well with just
using
Has any one experience a problem with the way the
standard html parser plugin handles relative urls?
There is a site where the home page is something like
http://www.x.com/x.cgi
and when browsing a link with its href set to
'?paramname=paramvalue'
a browser will naturally take you to
d the
> target server will save
> on bandwidth in fact ;)
>
>
> -Original Message-
> From: Raymond Creel [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 25, 2005 4:00 PM
> To: nutch-user@lucene.apache.org
> Subject: fetch bandwidth settings
>
> I have read th
I have read that you don't want to make more than 1 or
2 requests per second to the same host, or else you
will start adversely affecting their bandwidth. Is
this a good rule of thumb?
Along those lines, what would be the best values to
put in the nutch config file to maximize speed of
fetching
Ah yes, thank you - this will work nicely!
--- Howie Wang <[EMAIL PROTECTED]> wrote:
> >What I really would like is a way to pass in the
> >location of the config files (e.g.
> nutch-default.xml,
> >regex-urlfilter.txt, etc.) as an argument to the
> nutch
> >script, so that I can have multiple co
used a
mailing list in awhile.
--- Juho Mäkinen <[EMAIL PROTECTED]> wrote:
> Take a look into Nutch Wiki FAQ here:
> http://wiki.apache.org/nutch/FAQ
> And find the Q/A for "How can I force fetcher to use
> custom nutch-config?"
>
> - Juho Mäkinen, http://w
tions.
Thanks,
Raymond Creel
Sell on Yahoo! Auctions no fees. Bid on great items.
http://auctions.yahoo.com/
tions.
Thanks,
Raymond Creel
__
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail
11 matches
Mail list logo