Thanks, Håvard (and Doug, in the original email).
Those pointers, plus a few other tips from elsewhere, did the trick. I'm now
up and running with all CPUs.
One thing I found along the way was that if I did not set
mapred.child.heap.size, I would run out of heap space in initialization of
hi guys,
I tried crawling my site which works with a Domino web server talking to a
Tomcat - using the crawl command ( with all the config for urls, file-types
etc etc) - but the crawl log doesnt show any URLs being fetched.
Is there something different I need to do to run a crawl for a site
Are you crawling jsp's?
Put this in your regex-normalize.xml
regex
pattern(.*)(;jsessionid=[a-zA-Z0-9]{32})(.*)/pattern
substitution$1$3/substitution
/regex
***
And change this setting in your nutch-default.xml
property
nameurlnormalizer.class/name
Thanks for putting up with all the messages to the list... Here is the
recrawl script for 0.8.0 if anyone is interested.
Matt
---
#!/bin/bash
# Nutch recrawl script.
# Based on 0.7.2 script at
Hi Matt!
In the article found at
http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htmlyou
said the re-crawl script have a problem with updating the live search
index. In my tests with Nutch version 0.7.2 when I run the script the index
could not be update because the tomcat
Hi Matt and Lourival,
Matt, thank you for the recrawl script. Any plans to commit it to trunk?
Lourival, here's in the script what reloads Tomcat, not the cleanest,
but it should work
# Tell Tomcat to reload index
touch $nutch_dir/WEB-INF/web.xml
HTH,
Renaud
Lourival Júnior wrote:
Hi
Renaud Richardet wrote:
Hi Matt and Lourival,
Matt, thank you for the recrawl script. Any plans to commit it to trunk?
Lourival, here's in the script what reloads Tomcat, not the
cleanest, but it should work
# Tell Tomcat to reload index
touch $nutch_dir/WEB-INF/web.xml
HTH,
Renaud
I'm running a few commands every week to keep my nutch clean, but I'm a bit
confused if I'm doing it right.
I merge the segments using the following command:
bin/nutch mergesegs -dir crawl/segments/ -i -ds
this should index the new segment and delete the old ones, which it does.
After this
Hi Renaud!
I'm newbie with shell scripts and I know stops tomcat service is not the
better way to do this. The problem is, when a run the re-crawl script with
tomcat started I get this error:
060721 132224 merging segment indexes to: crawl-legislacao2\index
Exception in thread main
Lourival Júnior wrote:
Hi Renaud!
I'm newbie with shell scripts and I know stops tomcat service is not the
better way to do this. The problem is, when a run the re-crawl script
with
tomcat started I get this error:
060721 132224 merging segment indexes to: crawl-legislacao2\index
Exception
Lourival Júnior wrote:
I thing it wont work with me because i'm using the Nutch version 0.7.2.
Actually I use this script (some comments are in Portuguese):
#!/bin/bash
# A simple script to run a Nutch re-crawl
# Fonte do script:
Hello Nutchians
I am sure many of you would have experienced the same problem as me right now.
I have a domain name http://www.myopensourcejobs.com
I have my app hosted on a server (virtual dedicated server) 68.x.x.x in Go
daddy.
I want to configure and associate IPaddress and domain
Ok. However a few minutes ago I ran the script exactly you said and I still
get this error:
Exception in thread main java.io.IOException: Cannot delete _0.f0
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:195)
at
Does anyone have an idea why a record would be in the database but not show up
in the results?
I have 400+ pages from a certain domain in my database (checked using bin/nutch
admin ) yet when I search for the domain, titles to certain pages from the
domain, or unique URLs from the domain no
Hi Roberto,
Did you try http://wiki.apache.org/nutch/IntranetRecrawl (thanks to
Matthew Holt)
HTH,
Renaud
Info wrote:
Hi List
I try to use this script with hadoop but don't work.
I try to change ls with bin/hadoop dfs -ls
But the script don't work because is ls -d and don't ls only.
Hi,
I am new to Nutch and I got a null pointer exception whenI try to submit the
search through demo app.
Please see the error message below. I have modified the demo app to run in
its webapp context other than in ROOT context.
The first page shown and I put in the keyword to search and got the
16 matches
Mail list logo