is it a bug within nutch 1.2 when searching the index?

2011-06-23 Thread leibnitz
i used this version to crawl ,then i deploied the war to tomcat,when search by site: site:mail.163.com the result are listed: 163网易免费邮--中文邮箱第一品牌 163网易免费邮--中文邮箱第一品牌 中文邮箱第 ... http://mail.163.com/ (cached) (explain) (anchors) (more from mail.163.com) 163网易免费邮--中文邮箱第一品牌 163网易免费邮--中文邮箱第一品牌 中文邮箱第 ...

Re: is it a bug within nutch 1.2 when searching the index?

2011-06-23 Thread leibnitz
who can give me a tips ,please! -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-a-bug-within-nutch-1-2-when-searching-the-index-tp3098675p3099211.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Get frequency of word

2011-06-23 Thread caomanhdat
Thanks for your answer! So how can i get the frequency of a word in all document which is indexed by nutch. -- View this message in context: http://lucene.472066.n3.nabble.com/Get-frequency-of-word-tp3095236p3099835.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Get frequency of word

2011-06-23 Thread Gabriele Kahlout
So you want to do x/o using Solr? I can imagine a parser plugin (see wiki) to do that.

Re: nutch NoClassDefFound

2011-06-23 Thread abhayd
decide to use linux and it works fine on it. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-NoClassDefFound-tp3036674p3100372.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Empty indexes folder after crawling!

2011-06-23 Thread lewis john mcgibbney
Have you set your crawl directory property value in nutch-site.xml when launching the war file on tomcat? On Tue, Jun 21, 2011 at 4:01 AM, Mohammad Hassan Pandi pandi...@gmail.comwrote: follwing http://wiki.apache.org/nutch/NutchHadoopTutorial I crawled lucene.apache.org with command

Re: how to classify the search results by an indexed field with lucene?

2011-06-23 Thread lewis john mcgibbney
to give a short answer to your question the answer is I don't know. Many of us are not using Lucene as the indexing machanism. I think as this is specifically linked to Lucene you would be better asking there. try the user list http://lucene.apache.org/java/docs/mailinglists.html#Java User List

Re: Where Can I find Nutch war file??

2011-06-23 Thread lewis john mcgibbney
Hi, Assuming that you are using 1.2 the war file should definately be there. You will be able to get step by step directions for this in the tutorial on the Nutch site. http://wiki.apache.org/nutch/NutchTutorial Note that this will be getting updated soon to reflect changes incorporated into

Re: helpful books or tutorials on nutch

2011-06-23 Thread lewis john mcgibbney
As this is open source I think the best way to solve your question/request is to get down and dirty with your own configuration. Many implementation scenarios are unique, to a new Nutch user this may provide no immediate helpful credentials, however it clearly displays the adaptability and

Re: helpful books or tutorials on nutch

2011-06-23 Thread Nutch User - 1
I did a few testruns with 1.3 and managed to browse the index with Luke. Instead of trying to open Nutch's crawldb folder with Luke I opened Solr's data folder (or whatever it was called) and it worked. For some reason Luke couldn't open crawldb folder made with 1.3. On the other hand it was able

Problem implementing my own HtmlParseFilter

2011-06-23 Thread Matthias Naber
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hey, I'm new to the nutch project and just started to test some things. So I followed this example http://wiki.apache.org/nutch/WritingPluginExample and implemented my own HtmlParseFilter. My custom MyHtmlParseFilter works fine on most of the pages

Re: Depth-first crawling

2011-06-23 Thread Nutch User - 1
Big thanks for answering in the first place! (This mailing list seems to be too passive in my opinion. Are there any other channels for Nutch related conversation? I'm relatively new to Nutch and need often help with it.) But could you elaborate your idea? Unfortunately I don't have at the moment

Re: Solrdedup NPE

2011-06-23 Thread lewis john mcgibbney
Hi Markus, Can you list the steps you executed prior to the solrdedup please? I think I encountered something similar a while back and as my work was moving on I didn't get a chance to investigate it fully. On Tue, Jun 21, 2011 at 1:54 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi,

Re: Building Nutch 2.0 from the trunk

2011-06-23 Thread Markus Jelsma
You can safely build Nutch trunk with Gora 1089728. I can also build the current Nutch and Gora trunks. What error do you get? Hi, I think this is your second thread on this topic? I tried to get trunk to build but was unable as there are problems with Gora as Julien highlighted to me some

Re: Building Nutch 2.0 from the trunk

2011-06-23 Thread lewis john mcgibbney
I tried to build Nutch trunk in eclipse about circa 2 months ago. Gora built fine and from memory it was the ivy configuration within Nutch which had to be altered. I'm positive the problems I was having have now been rectified but I haven't tried since. That is why I am interested in why JUnit

Re: I need step-by-step tutorial to run Nutch 1.2 from source code

2011-06-23 Thread waycool
Here is a link to setup Nutch from both source and binary distributions: http://thetechietutorials.blogspot.com/2011/06/setup-apache-nutch-13-to-crawl-web.html -- View this message in context:

Re: helpful books or tutorials on nutch

2011-06-23 Thread waycool
Another way you can do is to use Solr for search instead of lukeall. Here is a link to the steps to setup Nutch and Solr together: http://thetechietutorials.blogspot.com/2011/06/solr-and-nutch-integration.html -- View this message in context:

Fwd: failure notice

2011-06-23 Thread Way Cool
Hi, Can you add me to the nutch user group? Thanks. -- Forwarded message -- From: mailer-dae...@apache.org Date: Thu, Jun 23, 2011 at 12:45 PM Subject: failure notice To: way1.wayc...@gmail.com Hi. This is the qmail-send program at apache.org. I'm afraid I wasn't able to deliver

Re: failure notice

2011-06-23 Thread waycool
Sure. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Fwd-failure-notice-tp3102253p3102875.html Sent from the Nutch - User mailing list archive at Nabble.com.