Daniel Varela Santoalla dvarela at ecmwf.int writes:
Hello Daniel
as i saw ur problem,if u set env variable properly ,plz check out is
u set NUTH_HOME properly,and before plz read shell script of nutch (bin/nutch).i
think now ur problem will solve out.If its not working u can send me
hello
i read ur exception.i think ,when u copy the nutch 0.7.2.war file from
nutch/bulid directory after run ant command,u miss some as clear in
exception,copy this directory org.apache.nutch.searcher.NutchBean from
nutch../build .i think this will search properly
bye
--
View this message
Hi all. First off, I'm using Nutch 0.72.
I've been playing with nutch for a couple weeks now, and have some
questions relating to indexing blog sites.
Many blog platforms have a changes.xml file posted on some schedule (
blogger.com/changes10.xml is every 10 minutes), that list the blogs
Hi,
Is there a way to Nutch ignore commom words while searching?
For example, while searching for the boy and the girl it would only look
for boy girl.
Thanks,
Marco
Is there a way to Nutch ignore commom words while searching?
For example, while searching for the boy and the girl it
would only look for boy girl.
Yes,
In nutch conf dir there is a file common-terms.utf8
Copy that file also in your java container
Hope this helps
Bogdan
Is there a way to Nutch ignore commom words while searching?
For example, while searching for the boy and the girl it
would only look for boy girl.
Small addition from wiki:
http://wiki.apache.org/nutch/FAQ#head-12f4fd64f03fc3cd0a3063b9283ed829963ed4
88
You can tweak your
Hi,
What if you start indexing videos and audio files, and without knowing you
index some mp3 or video that is illegal or protect by rights.
So to index videos and audio files there should be a human looking each
indexed video or audio file?
What do you think?
Marco
I successfully ran the intranet crawl and my nutch/crawl dir was
generated. I then deployed the war file and stopped/started tomcat from
within the crawl directory. However, when I attempt to actually run a
search, a page with the following error is returned. Any ideas?
Matt
*type*
Am Donnerstag, 13. Juli 2006 18:47 schrieb Matthew Holt:
I successfully ran the intranet crawl and my nutch/crawl dir was
generated. I then deployed the war file and stopped/started tomcat from
within the crawl directory. However, when I attempt to actually run a
search, a page with the
Timo Scheuer wrote:
Am Donnerstag, 13. Juli 2006 18:47 schrieb Matthew Holt:
I successfully ran the intranet crawl and my nutch/crawl dir was
generated. I then deployed the war file and stopped/started tomcat from
within the crawl directory. However, when I attempt to actually run a
search,
Just wondering what the general consensus is on using 0.8.0 in
production. Do you think it's stable enough to use?? I would ideally
want to use 0.7.2, but it is missing the parse-oo plugin that 0.8.0 has.
I attempted to port the parse-oo plugin to 0.7.2, but ran into some
complications due to
I remember reading in one of the threads a few weeks back. Most people
agreed that 0.8-dev is stable for a release.
I dont know what happened after that. I expect something might be out
in a couple weeks or by mid-Aug.
Cheers,
Jayant
On 7/13/06, Matthew Holt [EMAIL PROTECTED] wrote:
Just
Hi,
One thing that lots of us have noticed is that it takes a very long time for
the reduce to go from 95% to 100% in many cases.
I am running a crawl with 250 urls in the CrawlDB using 50 machines. UpdateDB
and ReadDB take a long time to go from 95% to 100%. Here is the major problem
that I
Jayant Kumar Gandhi wrote:
I remember reading in one of the threads a few weeks back. Most people
agreed that 0.8-dev is stable for a release.
I dont know what happened after that. I expect something might be out
in a couple weeks or by mid-Aug.
..what happened is that most people went on
I'm having trouble figuring out why I keep getting Added 0 pages when
running the crawl with nutch. I've searched the site and can't find an
answer to as what might be going wrong. I'm running this on windows using
eclipse because I may have to change the code slightly. I've already made a
few
How can i recrawl a specific web page. For example I have a html page that
is constantly update. There a command for that?
--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [EMAIL PROTECTED]
Hi Chris,
Hi all. First off, I'm using Nutch 0.72.
I've been playing with nutch for a couple weeks now, and have some
questions relating to indexing blog sites.
[snip]
Third... just in general... it seems I've had to goof with nutch's config
enough to make this work in this way, that
I'm having trouble figuring out why I keep getting Added 0 pages when
running the crawl with nutch. I've searched the site and can't find an
answer to as what might be going wrong. I'm running this on windows using
eclipse because I may have to change the code slightly. I've already made a
few
Hi,
in my opinion
Julius Schorzman wrote:
http://www.apache.com
is not matched by the regex
+^http://([a-z0-9]*\.)*apache.com/
as it does not end with a trailing slash.
Cheers
Karsten
I'm only a moderately experienced java programmer, so I was hoping I
could get a few pointers about where to begin on a particular problem.
I want to increase the score of a search result if the title contains
the search query and the site is from a particular site.
I thought that I could do
I'm only a moderately experienced java programmer, so I was hoping I
could get a few pointers about where to begin on a particular problem.
I want to increase the score of a search result if the title contains
the search query and the site is from a particular site.
Take a look to the
On 7/13/06, Stefan Groschupf [EMAIL PROTECTED] wrote:
I'm only a moderately experienced java programmer, so I was hoping I
could get a few pointers about where to begin on a particular problem.
I want to increase the score of a search result if the title contains
the search query and the
Jacob Brunson wrote:
orry, maybe I should have made myself a little more clear. I know I
can increase the boost generally on title matches, but what I want is
to further increase the boost on title matches ONLY IF the url is from
domain XYZ.com
Depending on whether you need this change to
23 matches
Mail list logo