Problem when using updatedb

2010-03-30 Thread hareesh
Hi, Iam having problem while updating the crawleddb from the crawled segments.when i try to run the command its talking too long at after 1200 seconds updating to crawldb is getting failed, so iam unable to do incremental crawling. is it a bug ...pls respond thanks in advance.. -- View this

Registration is now open for Apache Lucene EuroCon - Prague, Czech Republic, 18-21 May, 2010.

2010-03-30 Thread Grant Ingersoll
(sorry for the miss post yesterday, this is what should have been sent) Registration is now open for Apache Lucene EuroCon - Prague, Czech Republic, 18-21 May, 2010. To sign up, please visit: http://lucene-eurocon.org/register.html. Sponsored by Lucid Imagination; all net proceeds benefit the A

Crawl yahoo search result page

2010-03-30 Thread Kim Theng Chong
Hi all, Can Nutch crawl Yahoo search result page? eg : http://search.yahoo.com/search?rd=&fp_ip=my&p=ontology&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-892 (put as seed url) . I was not able to fetch the results in this page. Can someone guide me on this? Thank you. Best regards, Kim

RE: Crawl yahoo search result page

2010-03-30 Thread Devang Shah
No - you won't be able to crawl this page. Nutch will follow robots directive of the domain - see http://search.yahoo.com/robots.txt. -Devang. -Original Message- From: Kim Theng Chong [mailto:kimthe...@yahoo.com] Sent: Tuesday, March 30, 2010 10:00 PM To: nutch-user@lucene.apache.org S

Re: Crawl yahoo search result page

2010-03-30 Thread Kim Theng Chong
Hi Devang, Thank you so much for your reply. =) Have a nice day. Best regards, Kim From: Devang Shah To: nutch-user@lucene.apache.org Sent: Wed, March 31, 2010 11:28:50 AM Subject: RE: Crawl yahoo search result page No - you won't be able to crawl this page

Re: Crawl yahoo search result page

2010-03-30 Thread prashant ullegaddi
You can use the BOSS api (http://developer.yahoo.com/search/boss/) if you just wanted the search results. On Wed, Mar 31, 2010 at 9:22 AM, Kim Theng Chong wrote: > Hi Devang, > > Thank you so much for your reply. =) > > Have a nice day. > > Best regards, > Kim > > > > > _

Problem with writing index

2010-03-30 Thread hareesh
I was trying a crawl with 200 seeds. In previous cases it used to create the index with out any problem , now when i started the crawl its show the following exception at depth 2 attempt_201003301923_0007_m_00_0: Aborting with 100 hung threads. Task attempt_201003301923_0007_m_04_0 failed

current leaseholder is trying to recreate file.

2010-03-30 Thread hareesh
Anyone have insight on the following message attempt_201003301923_0007_m_00_0: -activeThreads=100, spinWaiting=0, fetchQueues.totalSize=4998 attempt_201003301923_0007_m_00_0: -activeThreads=100, spinWaiting=0, fetchQueues.totalSize=4998 attempt_201003301923_0007_m_00_0: Aborting with