problems with start-all command

2006-08-10 Thread kawther khazri
[input] [input] [input] [input] hello, we are trying to install nutch in single machine using this guide: http://wiki.apache.org/nutch/NutchHadoopTutorial?highlight=%28nutch%29;, we are blocked in this step: *first we execute this command

Crawling flash

2006-08-10 Thread Iain
I want to include embedded flash in my crawls. Despite (apparently successfully) including the parse-swf plugin, embedded flash does not seem to be retrieved. I’m assuming that the object tags are not being parsed to find the .swf files. Can anyone comment? Thanks Iain

problem with the DFS commande

2006-08-10 Thread kawther khazri
hello, When I execute the DFS commande,I have this: [EMAIL PROTECTED] search]$ bin/start-all.sh starting namenode, logging to /nutch/search/logs/hadoop-nutch-namenode-localhost.out The authenticity of host 'localhost (127.0.0.1)' can't be established. RSA key fingerprint is

Extended crawling configuration with mapred.input.value.class?

2006-08-10 Thread Timo Scheuer
Hi, I am interested in more comprehensive configuration of the crawl targets. The actual version only supports lists (files) containing URLs. One thing that could be desirable is the injection of URLs with metadata attached. This metadata (inserted into the CrawlData object) could be read by

number of mapper

2006-08-10 Thread Murat Ali Bayir
Hi everbody, Although I change the number of mappers in hadoop-site.xml and use job.setNumMapTasks method the system gives another number as a number of mapper, the problem only occurs for number of mapper, number of reducers works correctly. What I have to do for setting the number of

Index with synonyms

2006-08-10 Thread Keyserzero
Hey list, I would like to ask you if it is possible to start a search query with a simple word (e.g. Home). Then Nutch will lookup the word “Home” in a list with synonyms. Nutch will then recognize that “House” is a synonym for “Home”. Now, Nutch can start a search query with “House” and

Re: number of mapper

2006-08-10 Thread Dennis Kubes
There is also a mapred.tasktracker.tasks.maximum variable which may be causing the task number to be different. Dennis Murat Ali Bayir wrote: Hi everbody, Although I change the number of mappers in hadoop-site.xml and use job.setNumMapTasks method the system gives another number as a number

Re: problems with start-all command

2006-08-10 Thread Dennis Kubes
The name node is running. Run the bin/stop-all.sh script first and then do a ps -ef | grep NameNode to see if the process is still running. If it is, it may need to be killed by hand kill -9 processid. The second problem is the setup of ssh keys as described in previous email. Also I would

Re: number of mapper

2006-08-10 Thread Murat Ali Bayir
it can not be problem, it only restrict the number of tasks running simultaneously, there can be pending tasks also, i check that this not problem. I am not sure but I notice that the number of mapper tasks is equal to k*number of different parts in input path. To illusrate I have 15 parts

Re: number of mapper

2006-08-10 Thread Murat Ali Bayir
my configs are given below: in hadoop-site number of mapper = 130 in my code I use job.setNumMapTasks = 130 in hadoop-default numberof mapper = 2 in this configuration I have taken 135 mapper in my job. However there is no problem in number of reducer. Andrzej Bialecki wrote: Murat Ali Bayir

More Fetcher NullPointerException

2006-08-10 Thread Sellek, Greg
I am experiencing the same issue as a similar post for 8/6. Whenever I try and fetch pages, I see a lot of fetch of xxx failed with: java.lang.NullPointerException I have put the appropriate agent info in both the nutch-default and nutch-site config files. I tried using DEBUG logging to get

file access rights/permissions considerations - the least painful way

2006-08-10 Thread Tomi NA
I'm interested in crawling multiple shared folders (among other things) on a corporate LAN. It is a LAN of MS clients with Active Directory managed accounts. The users routinely access the files based on ntfs-level (and sharing?) permissions. Idealy, I'd like to set up a central server

common-terms.utf8

2006-08-10 Thread Lourival Júnior
Hi, Could anyone explain me what does exactly the common-terms.utf8 file? I don't understand the real functionality of this file... Regards, -- Lourival Junior Universidade Federal do Pará Curso de Bacharelado em Sistemas de Informação http://www.ufpa.br/cbsi Msn: [EMAIL PROTECTED]

crawl-urlfilter subpages of domains

2006-08-10 Thread Jens Martin Schubert
Hello, is it possible to crawl e.g. http://www.domain.com, but to skip crawling all urls matching to (http://www.domain.com/subpage/) I tried to achieve this with crawl-urlfilter.txt/regex-urlfilter.txt. but it doesn't work: -ftp.tu-clausthal.de

Re: Stalling during fetch (0.7)

2006-08-10 Thread Benjamin Higgins
Further details: If I run strace on the process, it looks like this, over and over and over: gettimeofday({1155249187, 52}, NULL) = 0 gettimeofday({1155249188, 389}, NULL) = 0 gettimeofday({1155249188, 679}, NULL) = 0 gettimeofday({1155249188, 955}, NULL) = 0