Re: [Nutch-dev] How to force http authentication

2004-09-20 Thread Andy Hedges
You can do this with the HttpClient patch and set up the client to force authentication everytime (for a certain realm and domain). You will have to alter the code slightly. I'm not on the right computer at the moment but I'll send you an example a bit later today. http://sourceforge.net/tracke

[Nutch-dev] Updatedb - blow up

2004-09-20 Thread Jason Boss
Does this mean anything to anyone? Do I reboot and try to update the database with this segment again? Will it screw up the database? Thanks, Jason p://www.angelfire.com/ga/GeneS/index.htmlT.?}>î ǰ êî¥KÝ>Ù>ÙÍ31http://www.angelfire.com/ga/Georgian/gallery.html1http://www.ang elfire.com/ga/Geor

[Nutch-dev] How to force http authentication

2004-09-20 Thread m h
Hello, I merged the code provided in bug #990560 to get http authentication (thanks for the code Matt). What I want to do is force the crawler to authenticate and then crawl a certain page. (If I don't authenticate then it still crawls, but just isn't able to find the links that appear when the

Re: [Nutch-dev] [ nutch-Bugs-1020724 ] parser for RTF files

2004-09-20 Thread Andy Hedges
John, I have rewritten the parser using a different library with no dependacies on X11 or anything else for that matter. Hope it's acceptable. https://sourceforge.net/tracker/index.php?func=detail&aid=1020724&group_id=59548&atid=491356 Cheers, Andy [EMAIL PROTECTED] wrote: Uh, the whol

[Nutch-dev] [ nutch-Bugs-1020724 ] parser for RTF files

2004-09-20 Thread SourceForge.net
Bugs item #1020724, was opened at 2004-09-01 21:00 Message generated for change (Comment added) made by andyhedges You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1020724&group_id=59548 Category: plugin: other Group: None Status: Open Resolution: None Pri

Re: [Nutch-dev] CVS

2004-09-20 Thread Andy Hedges
Sounds like a great plan! Subversion works using webdav over port 80 as I understand it. No more using nightlies. Andy Doug Cutting wrote: Stefan Groschupf wrote: Normal source-forge problem. Happens too often. Just drink a coffee and try it again in a hour or so. ;-/ I'm planning to move Nutch'

Re: [Nutch-dev] CVS

2004-09-20 Thread Doug Cutting
Stefan Groschupf wrote: Normal source-forge problem. Happens too often. Just drink a coffee and try it again in a hour or so. ;-/ I'm planning to move Nutch's code from CVS hosted at SourceForge to Subversion hosted at http://osuosl.org/. This should happen sometime in the next few months. Doug

[Nutch-dev] [Fwd: Mail delivery failed: returning message to sender]

2004-09-20 Thread Michael Cafarella
> > I know a lot of people have seen this problem, but I have not > run into it. I ran a crawl of about 100m pages back in August > with good luck. > > On a two-Xeon box with ~2 gigs of RAM, I would run a fetcher of > 200 threads. As Doug says, it took a little while to get up to > speed.

Re: [Nutch-dev] 08/27/04 - Nutch Fetching Strangeness

2004-09-20 Thread Michael Cafarella
Hi Jason, Does it happen reliably at the URL that you list, or is it intermittent? I have not seen this before. --Mike On Sat, 2004-09-18 at 08:48, Jason Boss wrote: > Hey guys, > > Using the 8/27/04 version of Nutch and am getting this strange error while > trying to fetch. > > Thank

Re: [Nutch-dev] WebDB Architecture Question -- follow up to Page Scores discussion

2004-09-20 Thread Michael Cafarella
Hi Jagdeep, On Wed, 2004-09-15 at 21:39, Sandhu, Jagdeep wrote: > Greetings, > > Another issue that I see with the WebDB is the fact that Pages and Links are > maintained by URLs and MD5 hashes. In my crawl of 64 million Travel related pages, I > have not seen a single example of page duplic