Hi,
I am running mapreduce with 3 machines:
one name node and two datanodes.
I am using the latest revision of nutch 0.8, revision number 368582, and
java version jdk1.5.0_06
I tried a very simple thing on all the three machines:
move file from local to ndfs :
bin/nutch ndfs -put tmp /user/rafi
Hi,
I am running few cycles of fetching on nutch 0.8 and I notice that the data
size is much smaller than the data size I got in version 0.7 (running the
same cycle about the same time from different machines), about 5G after the
third cycle starting with about 72000 URLs .
All the processes e
check the next command
FetchListTool (-local | -ndfs )
[-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers numFetchers]
[-adddays numDays]
This command call to a function called emitMultipleLists which spit out
several fetchlists, so that you can fetch across several machines.
red is now trunk...
Am 19.12.2005 um 18:46 schrieb Rafi Iz:
Hi all,
I am currently working with Nutch 0.7.1,
I want to start using the mapred, any ideas where I can find the latest
version.
B.T.W I looked at the path: http://svn.apache.org/repos/asf/lucene/
nutch/branches/
but the only dir
Hi all,
I am currently working with Nutch 0.7.1,
I want to start using the mapred, any ideas where I can find the latest
version.
B.T.W I looked at the path:
http://svn.apache.org/repos/asf/lucene/nutch/branches/
but the only directory that exists there is branch-0.7/
Thanks,
Raffi
___