i used this version to crawl ,then i deploied the war to tomcat,when search
by site:
site:mail.163.com
the result are listed:
163网易免费邮--中文邮箱第一品牌
163网易免费邮--中文邮箱第一品牌 中文邮箱第 ...
http://mail.163.com/ (cached) (explain) (anchors) (more from mail.163.com)
163网易免费邮--中文邮箱第一品牌
163网易免费邮--中文邮箱第一品牌 中文邮箱第 ...
who can give me a tips ,please!
--
View this message in context:
http://lucene.472066.n3.nabble.com/is-it-a-bug-within-nutch-1-2-when-searching-the-index-tp3098675p3099211.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Thanks for your answer!
So how can i get the frequency of a word in all document which is indexed by
nutch.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Get-frequency-of-word-tp3095236p3099835.html
Sent from the Nutch - User mailing list archive at Nabble.com.
So you want to do x/o using Solr? I can imagine a parser plugin (see wiki) to
do that.
decide to use linux and it works fine on it.
--
View this message in context:
http://lucene.472066.n3.nabble.com/nutch-NoClassDefFound-tp3036674p3100372.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Have you set your crawl directory property value in nutch-site.xml when
launching the war file on tomcat?
On Tue, Jun 21, 2011 at 4:01 AM, Mohammad Hassan Pandi
pandi...@gmail.comwrote:
follwing http://wiki.apache.org/nutch/NutchHadoopTutorial I crawled
lucene.apache.org with command
to give a short answer to your question the answer is I don't know. Many of
us are not using Lucene as the indexing machanism. I think as this is
specifically linked to Lucene you would be better asking there.
try the user list
http://lucene.apache.org/java/docs/mailinglists.html#Java User List
Hi,
Assuming that you are using 1.2 the war file should definately be there. You
will be able to get step by step directions for this in the tutorial on the
Nutch site.
http://wiki.apache.org/nutch/NutchTutorial
Note that this will be getting updated soon to reflect changes incorporated
into
As this is open source I think the best way to solve your question/request
is to get down and dirty with your own configuration. Many implementation
scenarios are unique, to a new Nutch user this may provide no immediate
helpful credentials, however it clearly displays the adaptability and
I did a few testruns with 1.3 and managed to browse the index with Luke.
Instead of trying to open Nutch's crawldb folder with Luke I opened Solr's
data folder (or whatever it was called) and it worked. For some reason Luke
couldn't open crawldb folder made with 1.3. On the other hand it was able
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hey,
I'm new to the nutch project and just started to test some things. So
I followed this example
http://wiki.apache.org/nutch/WritingPluginExample and implemented my
own HtmlParseFilter.
My custom MyHtmlParseFilter works fine on most of the pages
Big thanks for answering in the first place! (This mailing list seems to be
too passive in my opinion. Are there any other channels for Nutch related
conversation? I'm relatively new to Nutch and need often help with it.) But
could you elaborate your idea? Unfortunately I don't have at the moment
Hi Markus,
Can you list the steps you executed prior to the solrdedup please?
I think I encountered something similar a while back and as my work was
moving on I didn't get a chance to investigate it fully.
On Tue, Jun 21, 2011 at 1:54 PM, Markus Jelsma
markus.jel...@openindex.iowrote:
Hi,
You can safely build Nutch trunk with Gora 1089728. I can also build the
current Nutch and Gora trunks. What error do you get?
Hi,
I think this is your second thread on this topic? I tried to get trunk to
build but was unable as there are problems with Gora as Julien highlighted
to me some
I tried to build Nutch trunk in eclipse about circa 2 months ago. Gora built
fine and from memory it was the ivy configuration within Nutch which had to
be altered. I'm positive the problems I was having have now been
rectified but I haven't tried since. That is why I am interested in why
JUnit
Here is a link to setup Nutch from both source and binary distributions:
http://thetechietutorials.blogspot.com/2011/06/setup-apache-nutch-13-to-crawl-web.html
--
View this message in context:
Another way you can do is to use Solr for search instead of lukeall.
Here is a link to the steps to setup Nutch and Solr together:
http://thetechietutorials.blogspot.com/2011/06/solr-and-nutch-integration.html
--
View this message in context:
Hi, Can you add me to the nutch user group? Thanks.
-- Forwarded message --
From: mailer-dae...@apache.org
Date: Thu, Jun 23, 2011 at 12:45 PM
Subject: failure notice
To: way1.wayc...@gmail.com
Hi. This is the qmail-send program at apache.org.
I'm afraid I wasn't able to deliver
Sure. Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Fwd-failure-notice-tp3102253p3102875.html
Sent from the Nutch - User mailing list archive at Nabble.com.
19 matches
Mail list logo