RE: problem compiling plugin

2011-07-15 Thread lewis.mcgibb...@gmail.com
It looks like you dot have specifics set within your build.xml. The error log would also suggest this. Can you please post the lines causing the error -Original Message- From: Cam Bazz Sent: 14/07/2011, 6:19 PM To: user@nutch.apache.org Subject: problem compiling plugin Hello, I am

Re: problem compiling plugin

2011-07-15 Thread Cam Bazz
Hello Lewis, I have solved this problem by putting the ivy.jar where the ant releated jars are in my system. /usr/share/lib/ant - in ubuntu. I think we might want to add this to documentation for building plugins. The current problem is since lucene is gone in 1.3, i need a new solr based

Re: RSS feed parsing on Nutch 1.3

2011-07-15 Thread Julien Nioche
Have seen the problem on a feed. Opened issue https://issues.apache.org/jira/browse/NUTCH-1053 Thanks for reporting it On 13 July 2011 17:15, Julien Nioche lists.digitalpeb...@gmail.com wrote: I expected Tika (or whatever rss parser) to directly crawl links from the original rss structure,

Re: Recrawling with Solr backend

2011-07-15 Thread Chris Alexander
Hi Lewis, Sorry for the delay in responding - that clears those questions up thanks. For now we are working on a script to hopefully minimise the impact of the writes to the Solr index. We are also baking in deletions through the use of a Solr query and splitting separate domains out into their

Fetched pages has no content

2011-07-15 Thread Anders Rask
Hi! We are using Nutch to crawl a bunch of websites and index them to Solr. At the moment we are in the process of upgrading from Nutch 1.1 to Nutch 1.3 and in the same time going from one server to two servers. Unfortunately we are stuck with a problem which we haven't seen in the old

DOMBuiler.endElement fails

2011-07-15 Thread Markus Jelsma
Hi, With the Boilerpipe patch enabled i get an exception in DOMBuilder.endElement when parsing certain pages. Looking at the pages at random it seems the problem is limited to sites with frames. Commenting out the two lines of code in the method `fixes` the problem it looks like everything

Re: Integrating Solr 1.4.0 and Nutch 1.2

2011-07-15 Thread Markus Jelsma
Can you dump the tail of your hadoop.log? Nutch 1.2 and Solr 1.4.x should work with eachother. On Friday 15 July 2011 09:46:52 Yusniel Hidalgo Delgado wrote: Hello, I'm trying to integrate Solr 1.4.0 and Nutch 1.2 following the RunningNutchAndSolr wiki page. When I run the command line

Re: DOMBuiler.endElement fails

2011-07-15 Thread Markus Jelsma
Well, disabling the code isn't a good idea as everything gets messed up. I've encapsulated the pop in another isEmpty check and it's fixed now. The question remaining is why this only seems to happen with Boilerplate parsing pages with frames? Thanks On Friday 15 July 2011 15:23:18 Markus

what does the parse command does

2011-07-15 Thread Cam Bazz
Hello, Finally I got a working build environment, and I am doing some modifications and playing around. I also got my first plugin to build, and almost done with my custom parser. I have my custom plugin and the method public ParseResult filter(Content content, ParseResult parseResult,

RE: Deploying the web application in Nutch 1.2

2011-07-15 Thread Chip Calhoun
You've gotten me very close to a breakthrough. I've started over, and I've found that If I don't make any edits to nutch-site.xml, I get a working Nutch web app; I have no index and all of my searches fail, but I have Nutch. When I add my crawl location to nutch-site.xml and restart Tomcat,

Nutch 2.0 and Solr

2011-07-15 Thread Yusniel Hidalgo Delgado
Hello again, it will be posible to integrate Nutch 2.0 and Solr?. Best Regards. Y.H

Re: Nutch 2.0 and Solr

2011-07-15 Thread Markus Jelsma
Yes. Do you intend to use trunk in production? Hello again, it will be posible to integrate Nutch 2.0 and Solr?. Best Regards. Y.H

Re: Nutch 2.0 and Solr

2011-07-15 Thread Yusniel Hidalgo Delgado
No, I don't yet in production environment, I am thinking to open a new project based in nutch 2.0 and Solr and I will needed this data. Thanks for your quick reply. Best regards. El 15/07/11 20:23, Markus Jelsma escribió: Yes. Do you intend to use trunk in production? Hello again, it will

Re: what does the parse command does

2011-07-15 Thread lewis john mcgibbney
Hi C.B., Quite a few things here On Fri, Jul 15, 2011 at 5:19 PM, Cam Bazz camb...@gmail.com wrote: Hello, Finally I got a working build environment, and I am doing some modifications and playing around. Good to hear, although it is off topic can you share any hurdles you overcame with us

Re: Deploying the web application in Nutch 1.2

2011-07-15 Thread lewis john mcgibbney
Are you adding this to nutch-site within your webapp or just in your root Nutch installation. This needs to be included in your webapp version of nutch-site.xml. In my experience this was a small case of confusion at first. On Fri, Jul 15, 2011 at 7:03 PM, Chip Calhoun ccalh...@aip.org wrote:

RE: Deploying the web application in Nutch 1.2

2011-07-15 Thread Chip Calhoun
I'm definitely changing the file in my webapp. I can tell I'm doing that much right because it makes a noticeable change to the function of my web app; unfortunately, the change is that it seems to break everything. I've tried playing with the actual value for this, but with no success. In

Re: Deploying the web application in Nutch 1.2

2011-07-15 Thread lewis john mcgibbney
As a resource it would be wise to have a look at the list archives for an exact answer to this. Take a look at your catalina.out logs for more verbose info on where the error is. It has been a while since I have configured this now, sorry I can't be of more help in giving a definite answer. On

Re: problem compiling plugin

2011-07-15 Thread lewis john mcgibbney
Hi C.B., I'm in the process of overhauling PluginCentral on the wiki and have opened a wiki page for Plugin Gotchas [1]. Would it be possible to ask you to edit and define your understanding of the problem more specifically please. There is also an interesting page here [2], which you may or may

Re: LinkRank scores

2011-07-15 Thread lewis john mcgibbney
Hi, Do we have any suggestion to demystify this. I intend to look into webgraph in more detail soon as I wish to get a much more detailed picture of its functionality for link analysis purposes. On Wed, Jul 13, 2011 at 9:25 AM, Nutch User - 1 nutch.use...@gmail.comwrote: Does anyone know how

RE: Deploying the web application in Nutch 1.2

2011-07-15 Thread Chip Calhoun
Success! I'm posting this not because I need further help, but in case someone with a similar issue finds this in the list archives. First: I now know that if I make no changes to nutch-site.xml, Nutch will expect my crawl directory to be C:\Apache\Tomcat-5.5\crawl . So now I know that much.

modifying parse implementation

2011-07-15 Thread Cam Bazz
Hello, In my quest to create a custom parser, I have modified parseimpl to hold another ParseText called features, such as: public ParseImpl(String text, String features, ParseData data) { this(new ParseText(text), new ParseText(features), data, true); } public ParseImpl(ParseText

skipping invalid segments nutch 1.3

2011-07-15 Thread Leo Subscriptions
I'm running nutch 1.3 on 64 bit Ubuntu, following are the commands and relevant output. -- llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch inject /home/llist/nutchData/crawl/crawldb /home/llist/nutchData/seed Injector: starting at 2011-07-15 18:32:10

Re: skipping invalid segments nutch 1.3

2011-07-15 Thread Markus Jelsma
fetch, then parse. I'm running nutch 1.3 on 64 bit Ubuntu, following are the commands and relevant output. -- llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch inject /home/llist/nutchData/crawl/crawldb /home/llist/nutchData/seed Injector:

Thanks

2011-07-15 Thread Joye
Thanks

Re: skipping invalid segments nutch 1.3

2011-07-15 Thread Leo Subscriptions
Done, but now get additional errors: --- llist@LeosLinux:~/nutchData$ /usr/share/nutch/runtime/local/bin/nutch updatedb /home/llist/nutchData/crawl/crawldb -dir /home/llist/nutchData/crawl/segments/20110716105826 CrawlDb update: starting at 2011-07-16 11:03:56 CrawlDb update: db: