Re: Nutch crawl problem

2008-01-06 Thread Otis Gospodnetic
TED]> To: hadoop-user@lucene.apache.org Sent: Sunday, January 6, 2008 8:30:38 PM Subject: Re: Nutch crawl problem why i can crawl http://game.search.com but i can't crawl http://www.search.com? conf/crawl-urlfilter is # skip file:, ftp:, & mailto: urls -^(file|ftp|mailto): # skip imag

Re: Nutch crawl problem

2008-01-06 Thread jibjoice
nd some host i can't crawl because have error "Generator: 0 records selected for fetching, exiting ..." i set the same config for all host.why? -- View this message in context: http://www.nabble.com/Nutch-crawl-problem-tp14327978p14657080.html Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: Nutch crawl problem

2008-01-02 Thread jibjoice
FAILED - Task Id : task_0004_r_01_1, Status : FAILED - map 100% reduce 8% - map 100% reduce 0% - Task Id : task_0004_r_00_2, Status : FAILED now i use hadoop-0.12.2, nutch-0.9 and java jdk1.6.0. Why? i can't solve it 1 month ago. -- View this message in context: http://www.nabbl

Re: Nutch crawl problem

2008-01-02 Thread jibjoice
Status : FAILED now i use hadoop-0.12.2, nutch-0.9 and java jdk1.6.0. Why? i can't solve it 1 month ago. -- View this message in context: http://www.nabble.com/Nutch-crawl-problem-tp14327978p14575918.html Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: Nutch crawl problem

2007-12-25 Thread pvvpr
Status : FAILED >>>> >>> > task_0025_m_01_1: - Error running child >>>> >>> > task_0025_m_01_1: java.lang.ArrayIndexOutOfBoundsException: >>>> -1 >>>> >>> > task_0025_m_01_1: at >>>> >>&

Re: Nutch crawl problem

2007-12-24 Thread jibjoice
>>> > (MapTask.java:175) >>> >>> > task_0025_m_01_1: at >>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >>> >>> > (TaskTracker.java:1445) >>> >>> > - Task Id : task_0025_m_000001_2, Status : FAILED >>>

Re: Nutch crawl problem

2007-12-20 Thread jibjoice
>> >>> > (MapTask.java:175) >> >>> > task_0025_m_01_2: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - Task Id : task_0025_m_00_2, Status : FAILED >> >>> > task_0025_m_00_2: - Error running child >> >>> > task_0025_m_00_2: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_00_2: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_00_2: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_00_2: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_00_2: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_00_2: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_00_2: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - map 100% reduce 100% >> >>> > - Task Id : task_0025_m_01_3, Status : FAILED >> >>> > task_0025_m_01_3: - Error running child >> >>> > task_0025_m_01_3: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_01_3: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_01_3: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_01_3: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_01_3: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_01_3: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_01_3: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - Task Id : task_0025_m_00_3, Status : FAILED >> >>> > task_0025_m_00_3: - Error running child >> >>> > task_0025_m_00_3: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_00_3: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_00_3: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_00_3: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_00_3: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_00_3: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_00_3: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > Exception in thread "main" java.io.IOException: Job failed! >> >>> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) >> >>> > at org.apache.nutch.indexer.DeleteDuplicates.dedup >> >>> > (DeleteDuplicates.java:439) >> >>> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) >> >>> > >> >>> > how i solve it? > > > > -- View this message in context: http://www.nabble.com/Nutch-crawl-problem-tp14327978p14450181.html Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: Nutch crawl problem

2007-12-20 Thread pvvpr
I think you need to check the conf/crawl-urlfilter.txt file On Thursday 20 December 2007 04:55, jibjoice wrote: > please, help me to solve it > > jibjoice wrote: > > where i should solve this? why it generated 0 records? > > > > pvvpr wrote: > >> basically your indexes are empty since no URLs were

Re: Nutch crawl problem

2007-12-20 Thread jibjoice
t(MapTask.java:157) >>> > task_0025_m_00_2: at >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >>> > task_0025_m_00_2: at org.apache.hadoop.mapred.MapTask.run >>> > (MapTask.java:175) >>> > task_0025_m_00_2: a

Re: Nutch crawl problem

2007-12-18 Thread jibjoice
;> > task_0025_m_01_3: - Error running child >> > task_0025_m_01_3: java.lang.ArrayIndexOutOfBoundsException: -1 >> > task_0025_m_01_3: at >> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> > task_0025_m_01_3: at >> > org.apache.nutch.inde

Re: Nutch crawl problem

2007-12-18 Thread pvvpr
basically your indexes are empty since no URLs were generated and fetched. See this, > > - Generator: 0 records selected for fetching, exiting ... > > - Stopping at depth=0 - no more URLs to fetch. > > - No URLs to fetch - check your seed list and URL filters. > > - crawl finished: crawled when

Re: Nutch crawl problem

2007-12-18 Thread jibjoice
r.java:113) > task_0025_m_00_3: at > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade > r.next(DeleteDuplicates.java:176) > task_0025_m_00_3: at > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) > task_0025_m_00_3: at > org.apache.hadoo

Nutch crawl problem

2007-12-13 Thread jibjoice
solve it? -- View this message in context: http://www.nabble.com/Nutch-crawl-problem-tp14327978p14327978.html Sent from the Hadoop Users mailing list archive at Nabble.com.