I've been reading Nutch code recently. Below's some of my
understanding. Others should correct me if I'm wrong.
Regards,
- Feng Zhou
On Mon, 7 Mar 2005 21:15:38 -0500, Daniel Drazner <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Thanks for your email. I have come across this article before. Unfortunately
Hi,
Thanks for your email. I have come across this article before. Unfortunately
it doesn't reveal all secrets.
Thanks,
Daniel
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Feng Zhou
Sent: Monday, March 07, 2005 8:48 PM
To: [EMAIL PROTECTED]
Cc: nutch-dev
Hi Daniel,
I'm no expert on Nutch but I've found the Nutch tech report is a good
place to start. It has detailed discussion on how various components
work.
http://labs.commerce.net/wiki/images/0/06/CN-TR-04-04.pdf
- Feng Zhou
On Mon, 7 Mar 2005 19:45:30 -0500, Daniel Drazner <[EMAIL PROTECTED]
Hello,
First of all I would like to thank you for your wonderful work!!
I'm interesting to learn in details how Nutch Crawler design and performs
Web crawling. I tried to find any technical documentation that will describe
the fetching process. But only find one article
(http://nutch.sourceforge
Hello,
First of all I would like to thank you for your wonderful work!!
I'm interesting to learn in details how Nutch Crawler design and performs
Web crawling. I tried to find any technical documentation that will describe
the fetching process. But only find one article
(http://nutch.sourceforge
Hit limiter off-by-one bug
--
Key: NUTCH-5
URL: http://issues.apache.org/jira/browse/NUTCH-5
Project: Nutch
Type: Bug
Components: searcher
Reporter: Andy Liu
Priority: Minor
When re-searching for more raw hits, the first result o
Hi,
This is very interesting, thanks, Angel.
Doug's right about the datanode startup and replication problem. I believe
there's a simple fix for the problem you describe when starting up all the
datanodes.
He's also probably right about the namenode startup. A Namenode logs all its
Thanks for the report!
400,000 is a larger number of files than I have yet tested NDFS with,
and it looks like there are some issues caused by this. Mike has built
the largest NDFS systems that I know of (several terabytes spread over
around 20 machines) but these probably had less than a thous
Hi,
I have been doing some tests to find out if NDFS can be used at our
company to reliable store many files (both small and big) across a
cluster of cheap servers.
The short summary is that right now NDFS doesn't look viable for our needs.
I am sending the results of the test to the list, in c
i would like to join the developer mailing list for nutch
rohit upadhyay
---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the h
FYI -- I have the carrot plugin working on www.localsearch.hk
- Original Message -
From: "Dawid Weiss" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, March 03, 2005 11:41 PM
Subject: Re: [Nutch-dev] Plugins - sum up
Clustering Carrot2:
It is a search results clustering plugi
--- Begin Message ---
Hello,
Here's some easy but interesting questions... Thanks for help.
I crawled some pages with bin/nutch crawl? Some pages were not fetched
(because of some timeouts from really slow web server).
To get the unfetched (but existing pages) I ran "bin/nutch generate db
segme
12 matches
Mail list logo