Andrzej,
I am trying to restore human-oriented web-site tree using anchor text! As a
samle, page with anchor text "Motherboards" has many linked pages with
concrete motherboards, etc; we can group information in many cases.
Anchor text is the true subject of the page, but within same domain. BTW
On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Here you go. local filesystem and a single job tracker on another
> > machine. When the tasktracker and jobtracker are on the same box there
> > isn't a problem. When they are on different machines it runs into
> > issue
Rod Taylor wrote:
Here you go. local filesystem and a single job tracker on another
machine. When the tasktracker and jobtracker are on the same box there
isn't a problem. When they are on different machines it runs into
issues.
This is using mapred.local.dir on the local machine (not sharedd be
On Fri, 2005-11-04 at 22:57 -0500, Rod Taylor wrote:
> On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote:
> > Rod Taylor wrote:
> > > I tried running one datanode per machine connecting back to the same SAN
> > > but it seemed pretty clunky. A crash of any datanode would take down
> > > the en
On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > I tried running one datanode per machine connecting back to the same SAN
> > but it seemed pretty clunky. A crash of any datanode would take down
> > the entire system (no data replication since it's a common data-store
Rod Taylor wrote:
I tried running one datanode per machine connecting back to the same SAN
but it seemed pretty clunky. A crash of any datanode would take down
the entire system (no data replication since it's a common data-store in
the end). Reducing it to a single datanode did not have this im
protocol-httpclient does not follow redirects when fetching robots.txt
--
Key: NUTCH-124
URL: http://issues.apache.org/jira/browse/NUTCH-124
Project: Nutch
Type: Bug
Components: fetcher
Versi
On Fri, 2005-11-04 at 19:15 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > There is only a single datanode and there are 20 hosts.
>
> That's a lot of load on one datanode. I typically run a datanode on
> every host, accessing the local drives on that host.
I tried running one datanode per
Rod Taylor wrote:
There is only a single datanode and there are 20 hosts.
That's a lot of load on one datanode. I typically run a datanode on
every host, accessing the local drives on that host.
Doug
[ http://issues.apache.org/jira/browse/NUTCH-116?page=all ]
Paul Baclace updated NUTCH-116:
---
Attachment: comments_msgs_and_local_renames_during_TestNDFS.patch
> TestNDFS a JUnit test specifically for NDFS
> ---
>
>
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Every segment that I fetch seems to be missing a part when stored on the
> > filesystem. The stranger thing is it is always the same part (very
> > reproducible).
>
> This sounds strange. Are the datanode errors always
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Every segment that I fetch seems to be missing a part when stored on the
> > filesystem. The stranger thing is it is always the same part (very
> > reproducible).
>
> This sounds strange. Are the datanode errors always
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Every segment that I fetch seems to be missing a part when stored on the
> > filesystem. The stranger thing is it is always the same part (very
> > reproducible).
>
> This sounds strange. Are the datanode errors always
[ http://issues.apache.org/jira/browse/NUTCH-116?page=all ]
Paul Baclace updated NUTCH-116:
---
Attachment: required_by_TestNDFS_v3.patch
I found and fixed a problem with a standalone DataNode process exiting too
early (this was not detected by the current u
Ken van Mulder wrote:
First is that the fetcher slows down over time and continues to use more
and more memory as it goes (which I think is eventually hanging the
process).
What parser plugins do you have enabled? These are usually the culprit.
Try using 'kill -QUIT' to see what various thr
Rod Taylor wrote:
Every segment that I fetch seems to be missing a part when stored on the
filesystem. The stranger thing is it is always the same part (very
reproducible).
This sounds strange. Are the datanode errors always on the same host?
How many hosts are you running this on?
Doug
Cache.jsp some times generate NullPointerException
--
Key: NUTCH-123
URL: http://issues.apache.org/jira/browse/NUTCH-123
Project: Nutch
Type: Bug
Components: web gui
Environment: All systems
Reporter: Lutischán
Please do not cross post questions!
Checkout the map reduce branche in the svn. The map reduce will do
all what you are looking for and it works well for me.
Stefan
Am 04.11.2005 um 14:32 schrieb Arsen Popovyan:
At the moment we are using nutch-nightly (nutch-2005-07-20). We are
not plea
At the moment we are using nutch-nightly (nutch-2005-07-20). We are not pleased
with productivity of fetching, parsing, indexing, analyzing and scoring...
information. Now our spider retrieves approx 25,000 new results per day. All
processes now running on one computer (machine) and we are using
I actually have several projects, but let's start with the first. We
need to create a search engine that crawls about 20 adoption-related
sites that we are affiliated with, such as:
adoption.com
fosterparenting.com
crisispregnancy.com
adoption.org
adopting.org
123adoption.com (which includes a
Hi Nathan,
Please send me more details
Nathan Gwilliam <[EMAIL PROTECTED]> wrote:
We're looking for a Nutch developer we can hire to build a nutch search
engine for our sites. Are any of you doing side projects?
Nathan Gwilliam
Adoption.com & Families.com
[EMAIL PROTECTED]
>>
>>
WITH WA
We're looking for a Nutch developer we can hire to build a nutch search
engine for our sites. Are any of you doing side projects?
Nathan Gwilliam
Adoption.com & Families.com
[EMAIL PROTECTED]
22 matches
Mail list logo