cluster crawling?
I like it,but how to implment them?
thank you
2005/9/17, Daniele Menozzi <[EMAIL PROTECTED]>:
>
> On 19:37:42 16/Sep , Dawid Weiss wrote:
> > I also provided a sample implementation and it is a plugin available in
> > Nutch) using Carrot2 clustering components --
> >
> > htt
Just wondering if anyone has tried solaris containers
with nutch. Seems like it would be nice to have a
container or containers for each part of the process.
Containers allow for cpu/memory/disk io/network io
slicing (I am pretty sure on the last two). So it
would be a way to limit different p
Here is a patch for improving the error message that is displayed
when an intranet crawl commandline has a file instead of a directory
of files containing URLs.
The old error msg:
java.io.IOException: No input files in: [Ljava.io.File;@c24c0
Obviously, the default toString() says nothing.
The
Maybe way ahead of me here, but it was just hitting me
that it would be pretty cool to group urls to fetch my
host and then perhaps use http 1.1 to reuse the
connection and save initial handshaking overheard.
Not a huge deal for a couple hits, but it I think it
would make sense for large crawls.
Michael Ji wrote:
No particular vunerable higher than the case you
running a web server, if I am not wrong;
tomcat is same as a webserver except JSP is its' core
engine;
I would suggest following any instructions that Tomcat has
for locking it down. For instance, there is a conf setting
(the
The page:
http://wiki.apache.org/nutch/HowToContribute
should note under "Unit Tests" that some tests fail if the conf files are
modified.
Keep your conf files as *.x.mine or somesuch, and copy *.x.template to the *.x
files
before doing "ant test".
Paul
No particular vunerable higher than the case you
running a web server, if I am not wrong;
tomcat is same as a webserver except JSP is its' core
engine;
Michael Ji,
--- lumavanossi <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Is there any vulnerability on the use of Nutch that
> could let a server vul
Hi,
Is there any vulnerability on the use of Nutch that could let a server
vulnerabile?
The use of tomcat, for example, on port 8080 can let the server vulnerabile?
Is there a way to make the server secure?
Thanks,
Marco
DF error on long filesystem name
Key: NUTCH-93
URL: http://issues.apache.org/jira/browse/NUTCH-93
Project: Nutch
Type: Bug
Versions: 0.7
Environment: CentOS4.1 (like RedhatEnterprise4)
Reporter: Shuji Umino
Priority:
On 19:37:42 16/Sep , Dawid Weiss wrote:
> I also provided a sample implementation and it is a plugin available in
> Nutch) using Carrot2 clustering components --
>
> http://carrot2.sf.net, or the demo at http://carrot.cs.put.poznan.pl
very interesting.. But, what are the main differences betwee
On 19:33:57 16/Sep , Piotr Kosiorowski wrote:
> bin/nutch updatedb db $s1
> command updates WebDB with links you fetched in segment $s1.
ok, so the depth value is only used to stop the crawling at a certain
point, and proceed with the indexing, right?
But, another thing: how can I refresh old p
Hi Daniele.
There is a clustering API for on-line clustering in Nutch, so you can
start rolling out your ideas right away :)
I also provided a sample implementation and it is a plugin available in
Nutch) using Carrot2 clustering components --
http://carrot2.sf.net, or the demo at http://ca
bin/nutch updatedb db $s1
command updates WebDB with links you fetched in segment $s1.
Regards
Piotr
Daniele Menozzi wrote:
Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do
not have really understood what is the ralationship between
depth,segments,fetching..
Take for ex
at look at this good nutch doc
http://wiki.apache.org/nutch/DissectingTheNutchCrawler
Michael Ji
--- Daniele Menozzi <[EMAIL PROTECTED]> wrote:
> Hi all, I have questions regarding
> org.apache.nutch.tools.CrawlTool: I do
> not have really understood what is the ralationship
> between
> depth,s
Hi All, I'm interested in clustering (data clustering,more or less like
vivisimo.com does), is there a plugin or an addon for it?
I'm also interested in writing it, so, if someone has some advice, or some
lines of code, it would be very helpful :)
Thank you,
Menoz
--
Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do
not have really understood what is the ralationship between
depth,segments,fetching..
Take for example the tutorial, I understand theese 2 steps:
bin/nutch admin db -create
bin/nutch inject db -dmozfile conte
Hello Andrzej,
You can also try http://issues.apache.org/jira/browse/NUTCH-79
- I think it should also help here - it is a bit complicated as it
contain additional functionality but if you have any problems I am
willing to help. I am going to perform some test of it again and maybe
commit it in
I am using the Nutch-0.7 to implement a web search engine, this search
engine was working very well on Nutch-0.4,
I ve made a new crawl with Nutch-0.7, it seems everything is going OK
there..
I ve made all the changes to run the search engine with the new nutch
version..
But I got an exception
> > So ... feel free to provide a such plugin.
> > If I remember well, Andrzej has already a piece of code to do that. no?
> Yes, it comes from another package so I need to wrap it around in the
> plugin interfaces, give me a day or two...
Thanks
Jérôme
--
http://motrech.free.fr/
http://www.fru
Jérôme Charron wrote:
It should behave like
the unix-command "strings". Does this make sense? Are you on it too?
But we don't planned to develop it
Otherwise, I would offer my help.
So ... feel free to provide a such plugin.
If I remember well, Andrzej has already a piece of code to do that.
> What about a "default-plugin" as Andrzej proposed.
The default plugin mechanism is integrated in the parse-plugins descriptor
using the "*" content-type
> It should behave like
> the unix-command "strings". Does this make sense? Are you on it too?
But we don't planned to develop it
Otherwise
+1
What about a "default-plugin" as Andrzej proposed. It should behave like
the unix-command "strings". Does this make sense? Are you on it too?
Otherwise, I would offer my help.
Michael
Jon Shoberg wrote:
Jérôme Charron wrote:
Hi,
Chris, Sébastien and me have worked on a proposal for
22 matches
Mail list logo