nutch config files

2005-07-07 Thread Raymond Creel
I'm just getting started with Nutch. Does someone know how I may be able to get the nutch command-line script to load different nutch-default.xml/nutch-site.xml files than what is in the nutch/conf directory? I want to be able to run nutch at different sites with different startup configurations.

Re: nutch config files

2005-07-07 Thread Juho Mäkinen
Take a look into Nutch Wiki FAQ here: http://wiki.apache.org/nutch/FAQ And find the Q/A for "How can I force fetcher to use custom nutch-config?" - Juho Mäkinen, http://www.juhonkoti.net On 7/8/05, Raymond Creel <[EMAIL PROTECTED]> wrote: > I'm just getting started with Nutch. Does someone > kn

RE: Impressive performance

2005-07-07 Thread Emilijan Mirceski
>From my experience, when indexing, disk speed is the limiting factor once your computer has several GHz to work with. -Original Message- From: Vacuum Joe [mailto:[EMAIL PROTECTED] Sent: Thursday, July 07, 2005 11:23 PM To: nutch-user@lucene.apache.org Subject: Impressive performance I

Impressive performance

2005-07-07 Thread Vacuum Joe
I have Nutch running on an old clunker. It's a Pentium II with 512mb. I had it crawl 90,000 pages and index them, and Nutch can do searches on those 90,000 pages in about two or three seconds. All on one tired old PII. Very impressive. The part that's slow is indexing and crawling. Obviously

nutch config files

2005-07-07 Thread Raymond Creel
I'm just getting started with Nutch. Does someone know how I may be able to get the nutch command-line script to load different nutch-default.xml/nutch-site.xml files than what is in the nutch/conf directory? I want to be able to run nutch at different sites with different startup configurations.

Simple question about the merge tool

2005-07-07 Thread Vacuum Joe
I have a simple question about how to use the merge tool. I've done three small crawls resulting in three small segment directories. How can I merge these into one directory with one index? I notice the merge command options: Usage: IndexMerger (-local | -ndfs ) [-workingdir ] outputIndex segme

Another question about Page Ranking

2005-07-07 Thread shkim
Hello nutch users, I see someone has mentioned page rank, and there was one thing I was curious about it as well. Is the page rank assigned only after fetching? For instance, in the tutorial the sequence of actions is to fetch an initial set, update the db with that set, then fetch the top-most 1

Page Ranking

2005-07-07 Thread Zaheed Haque
Hello, I have just installed, test crawled and then tried out search. Search result page gives an option called "explain". Score Explanation, yes I like to know a bit more about the ranking systems innerworkings. I would be very glad if someone could point me to some documentation. -- Best Rega

Re: NDFS troubles

2005-07-07 Thread Jay Pound
how do I do a test against the mapred? -J - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: Sent: Thursday, July 07, 2005 3:28 PM Subject: Re: NDFS troubles > Trunk or mapred branch? If not mapred branch, please reproduce this in > the mapred branch, since that's where

Re: NDFS troubles

2005-07-07 Thread Doug Cutting
Trunk or mapred branch? If not mapred branch, please reproduce this in the mapred branch, since that's where development of NDFS happens now. Doug Jay Pound wrote: the version from 7/6/05 - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: Sent: Thursday, July 07, 200

Re: NDFS troubles

2005-07-07 Thread Jay Pound
the version from 7/6/05 - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: Sent: Thursday, July 07, 2005 2:31 PM Subject: Re: NDFS troubles > Jay Pound wrote: > > 64bit:/nutch # cd nutch-nightly/ > > What version of Nutch are you using? > > Please test NDFS against the

Re: New build ?

2005-07-07 Thread Doug Cutting
Kashif Khadim wrote: Just want to say that there is no new build for some days it will help if i can get the latest build. Nightly builds were down for a few days as I moved house. (They run on my workstation.) They're now back up (and I'm now starting to try to catch up on email). But it s

Re: NDFS troubles

2005-07-07 Thread Doug Cutting
Jay Pound wrote: 64bit:/nutch # cd nutch-nightly/ What version of Nutch are you using? Please test NDFS against the mapred branch: https://svn.apache.org/repos/asf/lucene/nutch/branches/mapred Doug

Re: ndfs stuff

2005-07-07 Thread Piotr Kosiorowski
Hello Andrzej, Attachment for NUTCH-46 contains latest and still correct version of the patch. It is working fine for me but it requires cygwin to work. I am helping Jay Pound to apply this patch right now so we would get some comments from him I hope. We can also ask Zhang Jin who originaly had

Re: ndfs stuff

2005-07-07 Thread Piotr Kosiorowski
Hello Ferenc, Some documentation on running ndfs can be found on wiki: http://wiki.apache.org/nutch/NutchDistributedFileSystem Regards, Piotr [EMAIL PROTECTED] wrote: Have any location the ndfs usage documentation? Regards, Ferenc

Re: [nutch 0.5] frames

2005-07-07 Thread Andrzej Bialecki
Philipp Suter wrote: does anybody know how to crawl frames? Or how to extend nutch to be able to crawl frames? We are using the api. The development version (available from SVN) should handle frames just fine, i.e. it should follow the src=... attributed in frames in order to retrieve the fra

NDFS why

2005-07-07 Thread webmaster
ok I'm running suse 9.3 on 3 computers, a amd athlon 64 3500+ the x86_64 edition of course, and on a pentium 4 3.0ghz suse x86, and a athlon 1900+ x86 version. I am trying to setup ndfs across all the nodes, the athlon 1900+ is the namenode and a datanode, the athlon 64 and pentium 4 are da

[nutch 0.5] frames

2005-07-07 Thread Philipp Suter
does anybody know how to crawl frames? Or how to extend nutch to be able to crawl frames? We are using the api. cheers ph

NDFS troubles

2005-07-07 Thread Jay Pound
ok I'm running suse 9.3 on 3 computers, a amd athlon 64 3500+ the x86_64 edition of course, and on a pentium 4 3.0ghz suse x86, and a athlon 1900+ x86 version. I am trying to setup ndfs across all the nodes, the athlon 1900+ is the namenode and a datanode, the athlon 64 and pentium 4 are datanodes,

Re: ndfs stuff

2005-07-07 Thread [EMAIL PROTECTED]
Have any location the ndfs usage documentation? Regards, Ferenc

Re: Nutch Fatal Exception when trying to search

2005-07-07 Thread quovadis
Hi Sebastien I thought I'd give you some feedback on this issue. As you are aware (or not) the problem was being experienced with file descriptors (file handles in windows) and as I was running win2k adv server I couldnt find any reference on how to up this value and using ulimit in cygwin still

Re: Page meta-data is not stored in segments?

2005-07-07 Thread Jérôme Charron
> > I have a good idea of how to handle that situation. > > If there are multiple and conflicting values for > > important meta-data such as the content-type, the page > > is horribly broken, and Nutch shouldn't waste effort > > trying to figure out what's going on. For example, if > [..] > > I un

Re: Page meta-data is not stored in segments?

2005-07-07 Thread Andrzej Bialecki
Vacuum Joe wrote: That's correct. Technically speaking, this is possible to do (ParseData.getMetadata()), we just didn't decide yet how to treat multiple values under the same key. I have a good idea of how to handle that situation. If there are multiple and conflicting values for importan