Re: [Nutch-dev] Re: Clustering

2005-09-16 Thread 周利兵
cluster crawling? I like it,but how to implment them? thank you 2005/9/17, Daniele Menozzi <[EMAIL PROTECTED]>: > > On 19:37:42 16/Sep , Dawid Weiss wrote: > > I also provided a sample implementation and it is a plugin available in > > Nutch) using Carrot2 clustering components -- > > > > htt

solaris containers

2005-09-16 Thread Earl Cahill
Just wondering if anyone has tried solaris containers with nutch. Seems like it would be nice to have a container or containers for each part of the process. Containers allow for cpu/memory/disk io/network io slicing (I am pretty sure on the last two). So it would be a way to limit different p

mapred patch for improved error message and some javadoc comments

2005-09-16 Thread Paul Baclace
Here is a patch for improving the error message that is displayed when an intranet crawl commandline has a file instead of a directory of files containing URLs. The old error msg: java.io.IOException: No input files in: [Ljava.io.File;@c24c0 Obviously, the default toString() says nothing. The

HTTP 1.1

2005-09-16 Thread Earl Cahill
Maybe way ahead of me here, but it was just hitting me that it would be pretty cool to group urls to fetch my host and then perhaps use http 1.1 to reuse the connection and save initial handshaking overheard. Not a huge deal for a couple hits, but it I think it would make sense for large crawls.

Re: Nutch vulnerabilities

2005-09-16 Thread Paul Baclace
Michael Ji wrote: No particular vunerable higher than the case you running a web server, if I am not wrong; tomcat is same as a webserver except JSP is its' core engine; I would suggest following any instructions that Tomcat has for locking it down. For instance, there is a conf setting (the

JUnit tests sensitive to local conf file changes, HowToContribute should have a note about this

2005-09-16 Thread Paul E. Baclace
The page: http://wiki.apache.org/nutch/HowToContribute should note under "Unit Tests" that some tests fail if the conf files are modified. Keep your conf files as *.x.mine or somesuch, and copy *.x.template to the *.x files before doing "ant test". Paul

Re: Nutch vulnerabilities

2005-09-16 Thread Michael Ji
No particular vunerable higher than the case you running a web server, if I am not wrong; tomcat is same as a webserver except JSP is its' core engine; Michael Ji, --- lumavanossi <[EMAIL PROTECTED]> wrote: > Hi, > > Is there any vulnerability on the use of Nutch that > could let a server vul

Nutch vulnerabilities

2005-09-16 Thread lumavanossi
Hi, Is there any vulnerability on the use of Nutch that could let a server vulnerabile? The use of tomcat, for example, on port 8080 can let the server vulnerabile? Is there a way to make the server secure? Thanks, Marco

[jira] Created: (NUTCH-93) DF error on long filesystem name

2005-09-16 Thread Shuji Umino (JIRA)
DF error on long filesystem name Key: NUTCH-93 URL: http://issues.apache.org/jira/browse/NUTCH-93 Project: Nutch Type: Bug Versions: 0.7 Environment: CentOS4.1 (like RedhatEnterprise4) Reporter: Shuji Umino Priority:

Re: Clustering

2005-09-16 Thread Daniele Menozzi
On 19:37:42 16/Sep , Dawid Weiss wrote: > I also provided a sample implementation and it is a plugin available in > Nutch) using Carrot2 clustering components -- > > http://carrot2.sf.net, or the demo at http://carrot.cs.put.poznan.pl very interesting.. But, what are the main differences betwee

Re: Problems on Crawling

2005-09-16 Thread Daniele Menozzi
On 19:33:57 16/Sep , Piotr Kosiorowski wrote: > bin/nutch updatedb db $s1 > command updates WebDB with links you fetched in segment $s1. ok, so the depth value is only used to stop the crawling at a certain point, and proceed with the indexing, right? But, another thing: how can I refresh old p

Re: Clustering

2005-09-16 Thread Dawid Weiss
Hi Daniele. There is a clustering API for on-line clustering in Nutch, so you can start rolling out your ideas right away :) I also provided a sample implementation and it is a plugin available in Nutch) using Carrot2 clustering components -- http://carrot2.sf.net, or the demo at http://ca

Re: Problems on Crawling

2005-09-16 Thread Piotr Kosiorowski
bin/nutch updatedb db $s1 command updates WebDB with links you fetched in segment $s1. Regards Piotr Daniele Menozzi wrote: Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do not have really understood what is the ralationship between depth,segments,fetching.. Take for ex

Re: Problems on Crawling

2005-09-16 Thread Michael Ji
at look at this good nutch doc http://wiki.apache.org/nutch/DissectingTheNutchCrawler Michael Ji --- Daniele Menozzi <[EMAIL PROTECTED]> wrote: > Hi all, I have questions regarding > org.apache.nutch.tools.CrawlTool: I do > not have really understood what is the ralationship > between > depth,s

Clustering

2005-09-16 Thread Daniele Menozzi
Hi All, I'm interested in clustering (data clustering,more or less like vivisimo.com does), is there a plugin or an addon for it? I'm also interested in writing it, so, if someone has some advice, or some lines of code, it would be very helpful :) Thank you, Menoz --

Problems on Crawling

2005-09-16 Thread Daniele Menozzi
Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do not have really understood what is the ralationship between depth,segments,fetching.. Take for example the tutorial, I understand theese 2 steps: bin/nutch admin db -create bin/nutch inject db -dmozfile conte

Re: DistributedSearch$Client.updateSegments() blocking other threads

2005-09-16 Thread Piotr Kosiorowski
Hello Andrzej, You can also try http://issues.apache.org/jira/browse/NUTCH-79 - I think it should also help here - it is a bit complicated as it contain additional functionality but if you have any problems I am willing to help. I am going to perform some test of it again and maybe commit it in

Exception java.lang.ArrayIndexOutOfBoundsException

2005-09-16 Thread Albakour, M-Dyaa
I am using the Nutch-0.7 to implement a web search engine, this search engine was working very well on Nutch-0.4, I ve made a new crawl with Nutch-0.7, it seems everything is going OK there.. I ve made all the changes to run the search engine with the new nutch version.. But I got an exception

Re: (NUTCH-88) Enhance ParserFactory plugin selection policy

2005-09-16 Thread Jérôme Charron
> > So ... feel free to provide a such plugin. > > If I remember well, Andrzej has already a piece of code to do that. no? > Yes, it comes from another package so I need to wrap it around in the > plugin interfaces, give me a day or two... Thanks Jérôme -- http://motrech.free.fr/ http://www.fru

Re: (NUTCH-88) Enhance ParserFactory plugin selection policy

2005-09-16 Thread Andrzej Bialecki
Jérôme Charron wrote: It should behave like the unix-command "strings". Does this make sense? Are you on it too? But we don't planned to develop it Otherwise, I would offer my help. So ... feel free to provide a such plugin. If I remember well, Andrzej has already a piece of code to do that.

Re: (NUTCH-88) Enhance ParserFactory plugin selection policy

2005-09-16 Thread Jérôme Charron
> What about a "default-plugin" as Andrzej proposed. The default plugin mechanism is integrated in the parse-plugins descriptor using the "*" content-type > It should behave like > the unix-command "strings". Does this make sense? Are you on it too? But we don't planned to develop it Otherwise

Re: (NUTCH-88) Enhance ParserFactory plugin selection policy

2005-09-16 Thread Michael Nebel
+1 What about a "default-plugin" as Andrzej proposed. It should behave like the unix-command "strings". Does this make sense? Are you on it too? Otherwise, I would offer my help. Michael Jon Shoberg wrote: Jérôme Charron wrote: Hi, Chris, Sébastien and me have worked on a proposal for