problems with start-all command

2006-08-10 Thread kawther khazri
[input] [input] [input] [input] hello, we are trying to install nutch in single machine using this guide: "http://wiki.apache.org/nutch/NutchHadoopTutorial?highlight=%28nutch%29";, we are blocked in this step: *first we execute this command

Crawling flash

2006-08-10 Thread Iain
I want to include embedded flash in my crawls. Despite (apparently successfully) including the parse-swf plugin, embedded flash does not seem to be retrieved. I’m assuming that the object tags are not being parsed to find the .swf files. Can anyone comment? Thanks Iain

problem with the DFS commande

2006-08-10 Thread kawther khazri
hello, When I execute the DFS commande,I have this: [EMAIL PROTECTED] search]$ bin/start-all.sh starting namenode, logging to /nutch/search/logs/hadoop-nutch-namenode-localhost.out The authenticity of host 'localhost (127.0.0.1)' can't be established. RSA key fingerprint is 81:0e:49:ce:61:8c:7b:09

Extended crawling configuration with "mapred.input.value.class"?

2006-08-10 Thread Timo Scheuer
Hi, I am interested in more comprehensive configuration of the crawl targets. The actual version only supports lists (files) containing URLs. One thing that could be desirable is the injection of URLs with metadata attached. This metadata (inserted into the CrawlData object) could be read by pl

number of mapper

2006-08-10 Thread Murat Ali Bayir
Hi everbody, Although I change the number of mappers in hadoop-site.xml and use job.setNumMapTasks method the system gives another number as a number of mapper, the problem only occurs for number of mapper, number of reducers works correctly. What I have to do for setting the number of mappers

Index with synonyms

2006-08-10 Thread Keyserzero
Hey list, I would like to ask you if it is possible to start a search query with a simple word (e.g. "Home"). Then Nutch will lookup the word “Home” in a list with synonyms. Nutch will then recognize that “House” is a synonym for “Home”. Now, Nutch can start a search query with “House” and “Ho

Re: number of mapper

2006-08-10 Thread Dennis Kubes
There is also a mapred.tasktracker.tasks.maximum variable which may be causing the task number to be different. Dennis Murat Ali Bayir wrote: Hi everbody, Although I change the number of mappers in hadoop-site.xml and use job.setNumMapTasks method the system gives another number as a number o

Re: problems with start-all command

2006-08-10 Thread Dennis Kubes
The name node is running. Run the bin/stop-all.sh script first and then do a ps -ef | grep NameNode to see if the process is still running. If it is, it may need to be killed by hand kill -9 processid. The second problem is the setup of ssh keys as described in previous email. Also I would re

Re: number of mapper

2006-08-10 Thread Murat Ali Bayir
it can not be problem, it only restrict the number of tasks running simultaneously, there can be pending tasks also, i check that this not problem. I am not sure but I notice that the number of mapper tasks is equal to k*number of different parts in input path. To illusrate I have 15 parts in

Re: number of mapper

2006-08-10 Thread Andrzej Bialecki
Murat Ali Bayir wrote: Hi everbody, Although I change the number of mappers in hadoop-site.xml and use job.setNumMapTasks method the system gives another number as a number of mapper, the problem only occurs for number of mapper, number of reducers works correctly. What I have to do for setti

Re: number of mapper

2006-08-10 Thread Murat Ali Bayir
my configs are given below: in hadoop-site number of mapper = 130 in my code I use job.setNumMapTasks = 130 in hadoop-default numberof mapper = 2 in this configuration I have taken 135 mapper in my job. However there is no problem in number of reducer. Andrzej Bialecki wrote: Murat Ali Bayir

Re: number of mapper

2006-08-10 Thread Dennis Kubes
Take a look at this, http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces It will answer why you have a few more map tasks that are set in the configuration. Dennis Murat Ali Bayir wrote: my configs are given below: in hadoop-site number of mapper = 130 in my code I use job.setNumMapT

More Fetcher NullPointerException

2006-08-10 Thread Sellek, Greg
I am experiencing the same issue as a similar post for 8/6. Whenever I try and fetch pages, I see a lot of "fetch of xxx failed with: java.lang.NullPointerException" I have put the appropriate agent info in both the nutch-default and nutch-site config files. I tried using DEBUG logging to get m

file access rights/permissions considerations - the least painful way

2006-08-10 Thread Tomi NA
I'm interested in crawling multiple shared folders (among other things) on a corporate LAN. It is a LAN of MS clients with Active Directory managed accounts. The users routinely access the files based on ntfs-level (and sharing?) permissions. Idealy, I'd like to set up a central server (probably

common-terms.utf8

2006-08-10 Thread Lourival Júnior
Hi, Could anyone explain me what does exactly the common-terms.utf8 file? I don't understand the real functionality of this file... Regards, -- Lourival Junior Universidade Federal do Pará Curso de Bacharelado em Sistemas de Informação http://www.ufpa.br/cbsi Msn: [EMAIL PROTECTED]

Nutch vs. Google Appliance

2006-08-10 Thread Stevenson, Kerry
Hello all - I have been taking a look at Nutch for purposes of indexing a large pile of internal LAN files at our company, and so far it looks quite impressive. I believe it could substitute for the Google Mini appliance. However, the bigger Google boxes add more features that I am not sure can be

crawl-urlfilter subpages of domains

2006-08-10 Thread Jens Martin Schubert
Hello, is it possible to crawl e.g. http://www.domain.com, but to skip crawling all urls matching to (http://www.domain.com/subpage/) I tried to achieve this with crawl-urlfilter.txt/regex-urlfilter.txt. but it doesn't work: -ftp.tu-clausthal.de -^http://([a-z0-9]*\.)asta.tu-clausthal.de/de/m

Stalling during fetch (0.7)

2006-08-10 Thread Benjamin Higgins
Hello, Nutch is stalling in the fetch process. I've run it twice now, and it is stopping on the *same* URL both times. I don't get what's going on! The last status report was: 060810 145315 status: segment 20060810142649, 7900 pages, 14 errors, 98421231 bytes, 1571224 ms 060810 145315 status: 5

Re: More Fetcher NullPointerException

2006-08-10 Thread Raphael Hoffmann
I had the same problem before. Just read http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg04303.html Make that tiny change on line 385 of HttpBase.java and it will work fine. Raphael Sellek, Greg wrote: I am experiencing the same issue as a similar post for 8/6. Whenever I try

Re: Stalling during fetch (0.7)

2006-08-10 Thread Benjamin Higgins
Further details: If I run strace on the process, it looks like this, over and over and over: gettimeofday({1155249187, 52}, NULL) = 0 gettimeofday({1155249188, 389}, NULL) = 0 gettimeofday({1155249188, 679}, NULL) = 0 gettimeofday({1155249188, 955}, NULL) = 0 clock_gettime(CLOCK_REALTI