You can do this with the HttpClient patch and set up the client to force
authentication everytime (for a certain realm and domain).
You will have to alter the code slightly. I'm not on the right computer
at the moment but I'll send you an example a bit later today.
http://sourceforge.net/tracke
Does this mean anything to anyone?
Do I reboot and try to update the database with this segment again? Will it
screw up the database?
Thanks,
Jason
p://www.angelfire.com/ga/GeneS/index.htmlT.?}>î ǰ
êî¥KÝ>Ù>ÙÍ31http://www.angelfire.com/ga/Georgian/gallery.html1http://www.ang
elfire.com/ga/Geor
Hello,
I merged the code provided in bug #990560 to get http authentication
(thanks for the code Matt).
What I want to do is force the crawler to authenticate and then crawl
a certain page. (If I don't authenticate then it still crawls, but
just isn't able to find the links that appear when the
John,
I have rewritten the parser using a different library with no
dependacies on X11 or anything else for that matter. Hope it's acceptable.
https://sourceforge.net/tracker/index.php?func=detail&aid=1020724&group_id=59548&atid=491356
Cheers,
Andy
[EMAIL PROTECTED] wrote:
Uh, the whol
Bugs item #1020724, was opened at 2004-09-01 21:00
Message generated for change (Comment added) made by andyhedges
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1020724&group_id=59548
Category: plugin: other
Group: None
Status: Open
Resolution: None
Pri
Sounds like a great plan! Subversion works using webdav over port 80 as
I understand it. No more using nightlies.
Andy
Doug Cutting wrote:
Stefan Groschupf wrote:
Normal source-forge problem. Happens too often.
Just drink a coffee and try it again in a hour or so. ;-/
I'm planning to move Nutch'
Stefan Groschupf wrote:
Normal source-forge problem. Happens too often.
Just drink a coffee and try it again in a hour or so. ;-/
I'm planning to move Nutch's code from CVS hosted at SourceForge to
Subversion hosted at http://osuosl.org/. This should happen sometime in
the next few months.
Doug
>
> I know a lot of people have seen this problem, but I have not
> run into it. I ran a crawl of about 100m pages back in August
> with good luck.
>
> On a two-Xeon box with ~2 gigs of RAM, I would run a fetcher of
> 200 threads. As Doug says, it took a little while to get up to
> speed.
Hi Jason,
Does it happen reliably at the URL that you list, or
is it intermittent? I have not seen this before.
--Mike
On Sat, 2004-09-18 at 08:48, Jason Boss wrote:
> Hey guys,
>
> Using the 8/27/04 version of Nutch and am getting this strange error while
> trying to fetch.
>
> Thank
Hi Jagdeep,
On Wed, 2004-09-15 at 21:39, Sandhu, Jagdeep wrote:
> Greetings,
>
> Another issue that I see with the WebDB is the fact that Pages and Links are
> maintained by URLs and MD5 hashes. In my crawl of 64 million Travel related pages, I
> have not seen a single example of page duplic
10 matches
Mail list logo