Hi,
I am new to Nutch and I got a null pointer exception whenI try to submit the
search through demo app.
Please see the error message below. I have modified the demo app to run in
its webapp context other than in ROOT context.
The first page shown and I put in the keyword to search and got the e
Hi Roberto,
Did you try http://wiki.apache.org/nutch/IntranetRecrawl (thanks to
Matthew Holt)
HTH,
Renaud
Info wrote:
Hi List
I try to use this script with hadoop but don't work.
I try to change ls with bin/hadoop dfs -ls
But the script don't work because is ls -d and don't ls only.
Someon
Hi List
I try to use this script with hadoop but don't work.
I try to change ls with bin/hadoop dfs -ls
But the script don't work because is ls -d and don't ls only.
Someone can help me
Best Regards
Roberto Navoni
-Messaggio originale-
Da: Matthew Holt [mailto:[EMAIL PROTECTED]
Inviato
Does anyone have an idea why a record would be in the database but not show up
in the results?
I have 400+ pages from a certain domain in my database (checked using bin/nutch
admin ) yet when I search for the domain, titles to certain pages from the
domain, or unique URLs from the domain no res
Ok. However a few minutes ago I ran the script exactly you said and I still
get this error:
Exception in thread "main" java.io.IOException: Cannot delete _0.f0
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:195)
at org.apache.lucene.store.FSDirectory.init(FSDirectory
Hello Nutchians
I am sure many of you would have experienced the same problem as me right now.
I have a domain name http://www.myopensourcejobs.com
I have my app hosted on a server (virtual dedicated server) 68.x.x.x in Go
daddy.
I want to configure and associate IPaddress and domain n
Lourival Júnior wrote:
I thing it wont work with me because i'm using the Nutch version 0.7.2.
Actually I use this script (some comments are in Portuguese):
#!/bin/bash
# A simple script to run a Nutch re-crawl
# Fonte do script:
http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutc
I thing it wont work with me because i'm using the Nutch version 0.7.2.
Actually I use this script (some comments are in Portuguese):
#!/bin/bash
# A simple script to run a Nutch re-crawl
# Fonte do script:
http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html
#{
if [ -n "$
Lourival Júnior wrote:
Hi Renaud!
I'm newbie with shell scripts and I know stops tomcat service is not the
better way to do this. The problem is, when a run the re-crawl script
with
tomcat started I get this error:
060721 132224 merging segment indexes to: crawl-legislacao2\index
Exception in
Hi Renaud!
I'm newbie with shell scripts and I know stops tomcat service is not the
better way to do this. The problem is, when a run the re-crawl script with
tomcat started I get this error:
060721 132224 merging segment indexes to: crawl-legislacao2\index
Exception in thread "main" java.io.IOE
I'm running a few commands every week to keep my nutch clean, but I'm a bit
confused if I'm doing it right.
I merge the segments using the following command:
bin/nutch mergesegs -dir crawl/segments/ -i -ds
this should index the new segment and delete the old ones, which it does.
After this wh
Renaud Richardet wrote:
Hi Matt and Lourival,
Matt, thank you for the recrawl script. Any plans to commit it to trunk?
Lourival, here's in the script what "reloads Tomcat", not the
cleanest, but it should work
# Tell Tomcat to reload index
touch $nutch_dir/WEB-INF/web.xml
HTH,
Renaud
Louri
Hi Matt and Lourival,
Matt, thank you for the recrawl script. Any plans to commit it to trunk?
Lourival, here's in the script what "reloads Tomcat", not the cleanest,
but it should work
# Tell Tomcat to reload index
touch $nutch_dir/WEB-INF/web.xml
HTH,
Renaud
Lourival Júnior wrote:
Hi Mat
Hi Matt!
In the article found at
http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htmlyou
said the re-crawl script have a problem with updating the live search
index. In my tests with Nutch version 0.7.2 when I run the script the index
could not be update because the tomcat lo
Thanks for putting up with all the messages to the list... Here is the
recrawl script for 0.8.0 if anyone is interested.
Matt
---
#!/bin/bash
# Nutch recrawl script.
# Based on 0.7.2 script at
http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch
Are you crawling jsp's?
Put this in your regex-normalize.xml
(.*)(;jsessionid=[a-zA-Z0-9]{32})(.*)
$1$3
***
And change this setting in your nutch-default.xml
urlnormalizer.class
org.apache.nutch.net.RegexUrlNormalizer
Name of the class used to normalize URLs.
-Original
hi guys,
I tried crawling my site which works with a Domino web server talking to a
Tomcat - using the crawl command ( with all the config for urls, file-types
etc etc) - but the crawl log doesnt show any URLs being fetched.
Is there something different I need to do to run a crawl for a site run
Thanks, Håvard (and Doug, in the original email).
Those pointers, plus a few other tips from elsewhere, did the trick. I'm now
up and running with all CPUs.
One thing I found along the way was that if I did not set
mapred.child.heap.size, I would run out of heap space in initialization of
inject
18 matches
Mail list logo