Dear List!
I have a problem with search engine.
I have ~900 000 pages in my test db.
When I make a search to the "notebook", in the first 4 hits
pages:
http://www.notebook.hu/catalogue/notebook/
http://www.notebook.hu/nb_search/
http://hp-renew.outlet.laptop.notebook.hu/
http://gigalan.notebook.hu
> Please confirm this little sum up and tutorial for plugins.
>
> (1)
> PARSING plugins : allow to parse different kinds of mime types -> html,
> text, pdf, msword, mp3, rtf
> ** parse-ext ** is a wrapper ... what can it do ?
Here is a description:
http://nutch.neasys.com/patch/20040703/note.txt
Thanks a lot.
I also started to run Nutch in debug mode. It's interesting experience but
any Tech documentation will definitely save me some time.
Will wait to see what others have to add here.
Thanks,
Daniel
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of
Angel,
Much of what you're seeing is part of the replication problem.
1) The "Replicated " message is when a successful replication
happens. It's not surprising that you see a lot of them.
2) The "Block XX is valid, and cannot be written to" happens when one
node tries to replicate
Hello,
From a list of start URLs (each associated with a regular expression),
I'd like to get - for each start URL - all URLs that come from the same
domain and that match the expression...I don't wanna analyse or index
the URLs, just to write them down in a flat file.
Example :
start URL : htt
Hi,
Great. Thanks for the tips.
I've tried the following startup sequences:
* Start NameNode. Wait until CPU goes to 0. Wait 2 extra minutes.
Start all DataNodes.
* Start NameNode. Wait until CPU goes to 0. Wait 2 extra minutes.
Start each DataNode with a 10 minutes pause between them.
* Star