If in my regex-urlfilter:
>> # skip URLs containing certain characters as probable queries, etc.
>> [EMAIL PROTECTED]
i skip '?' and '=', I will have more pages in my database.
Is there any strong reason why this was disabled in the release version?
(My segments have about ~100 thousand pages
Problem solved by an appropriate regex query. The reason for the problem is
some strange combination of java code and urls.
-Original Message-
From: Emilijan Mirceski [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 30, 2005 3:40 PM
To: nutch-user@lucene.apache.org
Subject: recursion: see r
Hi,
Just want to say that there is no new build for some
days it will help if i can get the latest build.
Thanks,
Kashif
__
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Lately, I'm receiving 1000's variations of the following:
050630 153456 fetching
http://www.idividi.com.mk/vesti/makedonija/Politika/315216/mt.net.mk/mt.net.
.k/mt.net.mk/mt.net.mk/mt.net.mk/mt.net.mk/mt.net.m
k/mt.net.mk/mt.net.mk
050630 153457 Response content length is not known
050630 153458
I find this property fetcher.server.maxurls, but how and when it works?