TED]>
To: hadoop-user@lucene.apache.org
Sent: Sunday, January 6, 2008 8:30:38 PM
Subject: Re: Nutch crawl problem
why i can crawl http://game.search.com but i can't crawl
http://www.search.com? conf/crawl-urlfilter is
# skip file:, ftp:, & mailto: urls
-^(file|ftp|mailto):
# skip imag
nd some host i can't crawl because have error "Generator: 0 records
selected for fetching, exiting ..." i set the same config for all host.why?
--
View this message in context:
http://www.nabble.com/Nutch-crawl-problem-tp14327978p14657080.html
Sent from the Hadoop Users mailing list archive at Nabble.com.
FAILED
- Task Id : task_0004_r_01_1, Status : FAILED
- map 100% reduce 8%
- map 100% reduce 0%
- Task Id : task_0004_r_00_2, Status : FAILED
now i use hadoop-0.12.2, nutch-0.9 and java jdk1.6.0. Why? i can't solve it
1 month ago.
--
View this message in context:
http://www.nabbl
Status : FAILED
now i use hadoop-0.12.2, nutch-0.9 and java jdk1.6.0. Why? i can't solve it
1 month ago.
--
View this message in context:
http://www.nabble.com/Nutch-crawl-problem-tp14327978p14575918.html
Sent from the Hadoop Users mailing list archive at Nabble.com.
Status : FAILED
>>>> >>> > task_0025_m_01_1: - Error running child
>>>> >>> > task_0025_m_01_1: java.lang.ArrayIndexOutOfBoundsException:
>>>> -1
>>>> >>> > task_0025_m_01_1: at
>>>> >>&
>>> > (MapTask.java:175)
>>> >>> > task_0025_m_01_1: at
>>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>>> >>> > (TaskTracker.java:1445)
>>> >>> > - Task Id : task_0025_m_000001_2, Status : FAILED
>>>
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_01_2: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - Task Id : task_0025_m_00_2, Status : FAILED
>> >>> > task_0025_m_00_2: - Error running child
>> >>> > task_0025_m_00_2: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_00_2: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_00_2: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_00_2: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_00_2: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_00_2: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_00_2: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - map 100% reduce 100%
>> >>> > - Task Id : task_0025_m_01_3, Status : FAILED
>> >>> > task_0025_m_01_3: - Error running child
>> >>> > task_0025_m_01_3: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_01_3: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_01_3: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_01_3: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_01_3: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_01_3: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_01_3: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - Task Id : task_0025_m_00_3, Status : FAILED
>> >>> > task_0025_m_00_3: - Error running child
>> >>> > task_0025_m_00_3: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_00_3: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_00_3: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_00_3: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_00_3: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_00_3: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_00_3: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > Exception in thread "main" java.io.IOException: Job failed!
>> >>> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>> >>> > at org.apache.nutch.indexer.DeleteDuplicates.dedup
>> >>> > (DeleteDuplicates.java:439)
>> >>> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>> >>> >
>> >>> > how i solve it?
>
>
>
>
--
View this message in context:
http://www.nabble.com/Nutch-crawl-problem-tp14327978p14450181.html
Sent from the Hadoop Users mailing list archive at Nabble.com.
I think you need to check the conf/crawl-urlfilter.txt file
On Thursday 20 December 2007 04:55, jibjoice wrote:
> please, help me to solve it
>
> jibjoice wrote:
> > where i should solve this? why it generated 0 records?
> >
> > pvvpr wrote:
> >> basically your indexes are empty since no URLs were
t(MapTask.java:157)
>>> > task_0025_m_00_2: at
>>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>>> > task_0025_m_00_2: at org.apache.hadoop.mapred.MapTask.run
>>> > (MapTask.java:175)
>>> > task_0025_m_00_2: a
;> > task_0025_m_01_3: - Error running child
>> > task_0025_m_01_3: java.lang.ArrayIndexOutOfBoundsException: -1
>> > task_0025_m_01_3: at
>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> > task_0025_m_01_3: at
>> > org.apache.nutch.inde
basically your indexes are empty since no URLs were generated and fetched. See
this,
> > - Generator: 0 records selected for fetching, exiting ...
> > - Stopping at depth=0 - no more URLs to fetch.
> > - No URLs to fetch - check your seed list and URL filters.
> > - crawl finished: crawled
when
r.java:113)
> task_0025_m_00_3: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_00_3: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_00_3: at
> org.apache.hadoo
solve it?
--
View this message in context:
http://www.nabble.com/Nutch-crawl-problem-tp14327978p14327978.html
Sent from the Hadoop Users mailing list archive at Nabble.com.
13 matches
Mail list logo