Failed again with Hsql 2.2.8 after 2 hours' crawling. Should I go back to Nutch 
1.5 or 1.6? It seems there are too many issues in Nutch 2.1. What a pity.

console:
Skipping http://blog.sina.com.cn/s/blog_blog_557f024c010.html; different batch 
id (null)
Exception in thread "main" java.lang.RuntimeException: job failed: name=parse, 
jobid=job_local_0008
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
    at org.apache.nutch.parse.ParserJob.run(ParserJob.java:251)
    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:171)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

hadoop.log
2013-01-04 02:42:53,292 INFO  parse.ParserJob - Skipping 
http://blog.sina.com.cn/s/blog_70b99cd80102ebqv.html; different batch id (null)
2013-01-04 02:43:07,412 WARN  mapred.FileOutputCommitter - Output path is null 
in cleanup
2013-01-04 02:43:07,436 WARN  mapred.LocalJobRunner - job_local_0008
java.io.IOException: java.sql.BatchUpdateException: data exception: string 
data, right truncation
    at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
    at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
    at 
org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
    at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.sql.BatchUpdateException: data exception: string data, right 
truncation
    at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown Source)
    at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
    ... 6 more








At 2013-01-03 21:52:35,"Renato Marroquín Mogrovejo" 
<[email protected]> wrote:
>Hi Rui,
>
>The way this works is that Nutch uses the gora-sql-mapping.xml file to
>create automatically the necessary tables and then use them. Anyways,
>IMHO I think you are hitting [1] which means you could try changing
>the gora-sql-mapping.xml file to what has been discussed on JIRA and
>then let us know so we can narrow it down.
>Thanks!
>
>
>Renato M.
>
>[1] https://issues.apache.org/jira/browse/GORA-24
>
>2013/1/3 高睿 <[email protected]>:
>> BTW, could you please share me the schema of webpage table or creation 
>> script?
>> It seems the table auto-generated by nutch2.1 have problems.
>>
>>
>>
>>
>>
>>
>> At 2013-01-03 21:43:26,"高睿" <[email protected]> wrote:
>>
>> I'm using this command:
>> bin/nutch crawl urls -solr http://localhost:8080/solr/collection2 -threads 
>> 10 -depth 2 -topN 1000
>> I guess the exception occurs when it try to store webpage into HSql. I tried 
>> to increase the column size, but it fails again. Here's the schema for HSql:
>> sql> \d webpage
>> NAME               DATATYPE     WIDTH  NO-NULLS  PRECISION  SCALE
>> -----------------  --------  --------  --------  ---------  -----
>> ID                 VARCHAR        767  *               767
>> HEADERS            BLOB      16777216             16777216
>> TEXT               VARCHAR   16777216             16777216
>> STATUS             INTEGER         11                   32
>> MARKERS            BLOB      16777216             16777216
>> PARSESTATUS        BLOB      16777216             16777216
>> MODIFIEDTIME       BIGINT          20                   64
>> SCORE              DOUBLE          23                   64
>> TYP                VARCHAR         32                   32
>> BASEURL            VARCHAR        767                  767
>> CONTENT            BLOB      16777216             16777216
>> TITLE              VARCHAR       2048                 2048
>> REPRURL            VARCHAR        767                  767
>> FETCHINTERVAL      INTEGER         11                   32
>> PREVFETCHTIME      BIGINT          20                   64
>> INLINKS            BLOB      16777216             16777216
>> PREVSIGNATURE      BLOB      16777216             16777216
>> OUTLINKS           BLOB      16777216             16777216
>> FETCHTIME          BIGINT          20                   64
>> RETRIESSINCEFETCH  INTEGER         11                   32
>> PROTOCOLSTATUS     BLOB      16777216             16777216
>> SIGNATURE          BLOB      16777216             16777216
>> METADATA           BLOB      16777216             16777216
>>
>>
>>
>>
>>
>>
>>
>> At 2013-01-03 21:06:04,"Lewis John Mcgibbney" <[email protected]> 
>> wrote:
>>>Hi Rui,
>>>
>>>The gora-sql backend is not stable so please do not be surprised if things
>>>do not work flawlessly.
>>>
>>>I would urge you to have a look at the gora-sql-mapping.xml file [0] and
>>>check the respective field values for the columns you are attempting to map.
>>>
>>>This aside, I would use the following SQL Store implementations if I were
>>>going to use this backend
>>>
>>>HSQLDB - 2.2.8
>>>MySQL - 5.1.18
>>>
>>>Which stage (in your Nutch processes) does this Exception occur?
>>>
>>>Lewis
>>>
>>>[0]
>>>http://svn.apache.org/repos/asf/nutch/branches/2.x/conf/gora-sql-mapping.xml
>>>
>>>On Thu, Jan 3, 2013 at 9:34 AM, 高睿 <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I can't run Nutch 2.1 with Mysql, then I tried Hsql, failed again. So,
>>>> which database are you using for nutch 2.1. I spent too much time on this
>>>> and can not make it work.
>>>>
>>>> 2013-01-03 16:12:06,812 WARN  mapred.FileOutputCommitter - Output path is
>>>> null in cleanup
>>>> 2013-01-03 16:12:06,835 WARN  mapred.LocalJobRunner - job_local_0008
>>>> java.io.IOException: java.sql.BatchUpdateException: data exception: string
>>>> data, right truncation
>>>>         at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
>>>>         at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
>>>>         at
>>>> org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
>>>>         at
>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
>>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>>         at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>>>> Caused by: java.sql.BatchUpdateException: data exception: string data,
>>>> right truncation
>>>>         at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown
>>>> Source)
>>>>         at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
>>>>         ... 6 more
>>>>
>>>> Regards,
>>>> Rui
>>>>
>>>
>>>
>>>
>>>--
>>>*Lewis*
>>
>>
>>

Reply via email to