I'm just getting started with Nutch. Does someone
know how I may be able to get the nutch command-line
script to load different
nutch-default.xml/nutch-site.xml files than what is in
the nutch/conf directory? I want to be able to run
nutch at different sites with different startup
configurations.
Take a look into Nutch Wiki FAQ here: http://wiki.apache.org/nutch/FAQ
And find the Q/A for "How can I force fetcher to use custom nutch-config?"
- Juho Mäkinen, http://www.juhonkoti.net
On 7/8/05, Raymond Creel <[EMAIL PROTECTED]> wrote:
> I'm just getting started with Nutch. Does someone
> kn
>From my experience, when indexing, disk speed is the limiting factor once
your computer has several GHz to work with.
-Original Message-
From: Vacuum Joe [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 07, 2005 11:23 PM
To: nutch-user@lucene.apache.org
Subject: Impressive performance
I
I have Nutch running on an old clunker. It's a
Pentium II with 512mb. I had it crawl 90,000 pages
and index them, and Nutch can do searches on those
90,000 pages in about two or three seconds. All on
one tired old PII. Very impressive.
The part that's slow is indexing and crawling.
Obviously
I'm just getting started with Nutch. Does someone
know how I may be able to get the nutch command-line
script to load different
nutch-default.xml/nutch-site.xml files than what is in
the nutch/conf directory? I want to be able to run
nutch at different sites with different startup
configurations.
I have a simple question about how to use the merge
tool. I've done three small crawls resulting in three
small segment directories. How can I merge these into
one directory with one index? I notice the merge
command options:
Usage: IndexMerger (-local | -ndfs )
[-workingdir ] outputIndex segme
Hello nutch users,
I see someone has mentioned page rank, and there was one thing I was
curious about it as well. Is the page rank assigned only after fetching?
For instance, in the tutorial the sequence of actions is to fetch an
initial set, update the db with that set, then fetch the top-most 1
Hello,
I have just installed, test crawled and then tried out search. Search
result page gives an option called "explain". Score Explanation, yes
I like to know a bit more about the ranking systems innerworkings. I
would be very glad if someone could point me to some documentation.
--
Best Rega
how do I do a test against the mapred?
-J
- Original Message -
From: "Doug Cutting" <[EMAIL PROTECTED]>
To:
Sent: Thursday, July 07, 2005 3:28 PM
Subject: Re: NDFS troubles
> Trunk or mapred branch? If not mapred branch, please reproduce this in
> the mapred branch, since that's where
Trunk or mapred branch? If not mapred branch, please reproduce this in
the mapred branch, since that's where development of NDFS happens now.
Doug
Jay Pound wrote:
the version from 7/6/05
- Original Message -
From: "Doug Cutting" <[EMAIL PROTECTED]>
To:
Sent: Thursday, July 07, 200
the version from 7/6/05
- Original Message -
From: "Doug Cutting" <[EMAIL PROTECTED]>
To:
Sent: Thursday, July 07, 2005 2:31 PM
Subject: Re: NDFS troubles
> Jay Pound wrote:
> > 64bit:/nutch # cd nutch-nightly/
>
> What version of Nutch are you using?
>
> Please test NDFS against the
Kashif Khadim wrote:
Just want to say that there is no new build for some
days it will help if i can get the latest build.
Nightly builds were down for a few days as I moved house. (They run on
my workstation.) They're now back up (and I'm now starting to try to
catch up on email). But it s
Jay Pound wrote:
64bit:/nutch # cd nutch-nightly/
What version of Nutch are you using?
Please test NDFS against the mapred branch:
https://svn.apache.org/repos/asf/lucene/nutch/branches/mapred
Doug
Hello Andrzej,
Attachment for NUTCH-46 contains latest and still correct version of the
patch. It is working fine for me but it requires cygwin to work.
I am helping Jay Pound to apply this patch right now so we would get
some comments from him I hope.
We can also ask Zhang Jin who originaly had
Hello Ferenc,
Some documentation on running ndfs can be found on wiki:
http://wiki.apache.org/nutch/NutchDistributedFileSystem
Regards,
Piotr
[EMAIL PROTECTED] wrote:
Have any location the ndfs usage documentation?
Regards,
Ferenc
Philipp Suter wrote:
does anybody know how to crawl frames? Or how to extend nutch to be able
to crawl frames? We are using the api.
The development version (available from SVN) should handle frames just
fine, i.e. it should follow the src=... attributed in frames in order to
retrieve the fra
ok I'm running suse 9.3 on 3 computers, a amd athlon 64 3500+ the x86_64
edition of course, and on a pentium 4 3.0ghz suse x86, and a athlon 1900+
x86 version. I am trying to setup ndfs across all the nodes, the athlon
1900+ is the namenode and a datanode, the athlon 64 and pentium 4 are
da
does anybody know how to crawl frames? Or how to extend nutch to be able
to crawl frames? We are using the api.
cheers
ph
ok I'm running suse 9.3 on 3 computers, a amd athlon 64 3500+ the x86_64
edition of course, and on a pentium 4 3.0ghz suse x86, and a athlon 1900+
x86 version. I am trying to setup ndfs across all the nodes, the athlon
1900+ is the namenode and a datanode, the athlon 64 and pentium 4 are
datanodes,
Have any location the ndfs usage documentation?
Regards,
Ferenc
Hi Sebastien
I thought I'd give you some feedback on this issue. As you
are aware (or not) the problem was being experienced with
file descriptors (file handles in windows) and as I was
running win2k adv server I couldnt find any reference on
how to up this value and using ulimit in cygwin still
> > I have a good idea of how to handle that situation.
> > If there are multiple and conflicting values for
> > important meta-data such as the content-type, the page
> > is horribly broken, and Nutch shouldn't waste effort
> > trying to figure out what's going on. For example, if
> [..]
>
> I un
Vacuum Joe wrote:
That's correct. Technically speaking, this is
possible to do
(ParseData.getMetadata()), we just didn't decide yet
how to treat
multiple values under the same key.
I have a good idea of how to handle that situation.
If there are multiple and conflicting values for
importan
23 matches
Mail list logo