It looks like you dot have specifics set within your build.xml. The error log
would also suggest this. Can you please post the lines causing the error
-Original Message-
From: Cam Bazz
Sent: 14/07/2011, 6:19 PM
To: user@nutch.apache.org
Subject: problem compiling plugin
Hello,
I am
Hello Lewis,
I have solved this problem by putting the ivy.jar where the ant
releated jars are in my system. /usr/share/lib/ant - in ubuntu.
I think we might want to add this to documentation for building plugins.
The current problem is since lucene is gone in 1.3, i need a new solr
based
Have seen the problem on a feed. Opened issue
https://issues.apache.org/jira/browse/NUTCH-1053
Thanks for reporting it
On 13 July 2011 17:15, Julien Nioche lists.digitalpeb...@gmail.com wrote:
I expected Tika (or whatever rss parser) to directly crawl links from the
original rss structure,
Hi Lewis,
Sorry for the delay in responding - that clears those questions up thanks.
For now we are working on a script to hopefully minimise the impact of the
writes to the Solr index. We are also baking in deletions through the use of
a Solr query and splitting separate domains out into their
Hi!
We are using Nutch to crawl a bunch of websites and index them to Solr. At
the moment we are in the process of upgrading from Nutch 1.1 to Nutch 1.3
and in the same time going from one server to two servers.
Unfortunately we are stuck with a problem which we haven't seen in the old
Hi,
With the Boilerpipe patch enabled i get an exception in DOMBuilder.endElement
when parsing certain pages. Looking at the pages at random it seems the
problem is limited to sites with frames.
Commenting out the two lines of code in the method `fixes` the problem it
looks like everything
Can you dump the tail of your hadoop.log? Nutch 1.2 and Solr 1.4.x should work
with eachother.
On Friday 15 July 2011 09:46:52 Yusniel Hidalgo Delgado wrote:
Hello,
I'm trying to integrate Solr 1.4.0 and Nutch 1.2 following the
RunningNutchAndSolr wiki page. When I run the command line
Well, disabling the code isn't a good idea as everything gets messed up. I've
encapsulated the pop in another isEmpty check and it's fixed now. The question
remaining is why this only seems to happen with Boilerplate parsing pages with
frames?
Thanks
On Friday 15 July 2011 15:23:18 Markus
Hello,
Finally I got a working build environment, and I am doing some
modifications and playing around.
I also got my first plugin to build, and almost done with my custom parser.
I have my custom plugin and the method
public ParseResult filter(Content content, ParseResult parseResult,
You've gotten me very close to a breakthrough. I've started over, and I've
found that If I don't make any edits to nutch-site.xml, I get a working Nutch
web app; I have no index and all of my searches fail, but I have Nutch. When I
add my crawl location to nutch-site.xml and restart Tomcat,
Hello again, it will be posible to integrate Nutch 2.0 and Solr?.
Best Regards.
Y.H
Yes. Do you intend to use trunk in production?
Hello again, it will be posible to integrate Nutch 2.0 and Solr?.
Best Regards.
Y.H
No, I don't yet in production environment, I am thinking to open a new
project based in nutch 2.0 and Solr and I will needed this data. Thanks
for your quick reply.
Best regards.
El 15/07/11 20:23, Markus Jelsma escribió:
Yes. Do you intend to use trunk in production?
Hello again, it will
Hi C.B.,
Quite a few things here
On Fri, Jul 15, 2011 at 5:19 PM, Cam Bazz camb...@gmail.com wrote:
Hello,
Finally I got a working build environment, and I am doing some
modifications and playing around.
Good to hear, although it is off topic can you share any hurdles you
overcame with us
Are you adding this to nutch-site within your webapp or just in your root
Nutch installation. This needs to be included in your webapp version of
nutch-site.xml. In my experience this was a small case of confusion at
first.
On Fri, Jul 15, 2011 at 7:03 PM, Chip Calhoun ccalh...@aip.org wrote:
I'm definitely changing the file in my webapp. I can tell I'm doing that much
right because it makes a noticeable change to the function of my web app;
unfortunately, the change is that it seems to break everything.
I've tried playing with the actual value for this, but with no success. In
As a resource it would be wise to have a look at the list archives for an
exact answer to this. Take a look at your catalina.out logs for more verbose
info on where the error is.
It has been a while since I have configured this now, sorry I can't be of
more help in giving a definite answer.
On
Hi C.B.,
I'm in the process of overhauling PluginCentral on the wiki and have opened
a wiki page for Plugin Gotchas [1]. Would it be possible to ask you to edit
and define your understanding of the problem more specifically please. There
is also an interesting page here [2], which you may or may
Hi,
Do we have any suggestion to demystify this. I intend to look into webgraph
in more detail soon as I wish to get a much more detailed picture of its
functionality for link analysis purposes.
On Wed, Jul 13, 2011 at 9:25 AM, Nutch User - 1 nutch.use...@gmail.comwrote:
Does anyone know how
Success! I'm posting this not because I need further help, but in case someone
with a similar issue finds this in the list archives.
First: I now know that if I make no changes to nutch-site.xml, Nutch will
expect my crawl directory to be C:\Apache\Tomcat-5.5\crawl . So now I know
that much.
Hello,
In my quest to create a custom parser, I have modified parseimpl to
hold another ParseText called features, such as:
public ParseImpl(String text, String features, ParseData data) {
this(new ParseText(text), new ParseText(features), data, true);
}
public ParseImpl(ParseText
I'm running nutch 1.3 on 64 bit Ubuntu, following are the commands and
relevant output.
--
llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
inject /home/llist/nutchData/crawl/crawldb /home/llist/nutchData/seed
Injector: starting at 2011-07-15 18:32:10
fetch, then parse.
I'm running nutch 1.3 on 64 bit Ubuntu, following are the commands and
relevant output.
--
llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
inject /home/llist/nutchData/crawl/crawldb /home/llist/nutchData/seed
Injector:
Thanks
Done, but now get additional errors:
---
llist@LeosLinux:~/nutchData$ /usr/share/nutch/runtime/local/bin/nutch
updatedb /home/llist/nutchData/crawl/crawldb
-dir /home/llist/nutchData/crawl/segments/20110716105826
CrawlDb update: starting at 2011-07-16 11:03:56
CrawlDb update: db:
25 matches
Mail list logo