Hi Rui,

Yes you are completely free to move to the 1.x trunk. Admittedly this is
more stable.

I would advise you to try to have the fecther running (on one fetch task)
for less that 2 hours (maybe around and hour or even less if possible).
This will prevent you from loosing too much data + time + effort should a
fetch turn bad.

Lewis

On Thu, Jan 3, 2013 at 8:31 PM, 高睿 <[email protected]> wrote:

> Failed again with Hsql 2.2.8 after 2 hours' crawling. Should I go back to
> Nutch 1.5 or 1.6? It seems there are too many issues in Nutch 2.1. What a
> pity.
>
> console:
> Skipping http://blog.sina.com.cn/s/blog_blog_557f024c010.html; different
> batch id (null)
> Exception in thread "main" java.lang.RuntimeException: job failed:
> name=parse, jobid=job_local_0008
>     at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
>     at org.apache.nutch.parse.ParserJob.run(ParserJob.java:251)
>     at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>     at org.apache.nutch.crawl.Crawler.run(Crawler.java:171)
>     at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>
> hadoop.log
> 2013-01-04 02:42:53,292 INFO  parse.ParserJob - Skipping
> http://blog.sina.com.cn/s/blog_70b99cd80102ebqv.html; different batch id
> (null)
> 2013-01-04 02:43:07,412 WARN  mapred.FileOutputCommitter - Output path is
> null in cleanup
> 2013-01-04 02:43:07,436 WARN  mapred.LocalJobRunner - job_local_0008
> java.io.IOException: java.sql.BatchUpdateException: data exception: string
> data, right truncation
>     at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
>     at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
>     at
> org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
>     at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Caused by: java.sql.BatchUpdateException: data exception: string data,
> right truncation
>     at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown Source)
>     at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
>     ... 6 more
>
>
>
>
>
>
>
>
> At 2013-01-03 21:52:35,"Renato Marroquín Mogrovejo" <
> [email protected]> wrote:
> >Hi Rui,
> >
> >The way this works is that Nutch uses the gora-sql-mapping.xml file to
> >create automatically the necessary tables and then use them. Anyways,
> >IMHO I think you are hitting [1] which means you could try changing
> >the gora-sql-mapping.xml file to what has been discussed on JIRA and
> >then let us know so we can narrow it down.
> >Thanks!
> >
> >
> >Renato M.
> >
> >[1] https://issues.apache.org/jira/browse/GORA-24
> >
> >2013/1/3 高睿 <[email protected]>:
> >> BTW, could you please share me the schema of webpage table or creation
> script?
> >> It seems the table auto-generated by nutch2.1 have problems.
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2013-01-03 21:43:26,"高睿" <[email protected]> wrote:
> >>
> >> I'm using this command:
> >> bin/nutch crawl urls -solr http://localhost:8080/solr/collection2-threads 
> >> 10 -depth 2 -topN 1000
> >> I guess the exception occurs when it try to store webpage into HSql. I
> tried to increase the column size, but it fails again. Here's the schema
> for HSql:
> >> sql> \d webpage
> >> NAME               DATATYPE     WIDTH  NO-NULLS  PRECISION  SCALE
> >> -----------------  --------  --------  --------  ---------  -----
> >> ID                 VARCHAR        767  *               767
> >> HEADERS            BLOB      16777216             16777216
> >> TEXT               VARCHAR   16777216             16777216
> >> STATUS             INTEGER         11                   32
> >> MARKERS            BLOB      16777216             16777216
> >> PARSESTATUS        BLOB      16777216             16777216
> >> MODIFIEDTIME       BIGINT          20                   64
> >> SCORE              DOUBLE          23                   64
> >> TYP                VARCHAR         32                   32
> >> BASEURL            VARCHAR        767                  767
> >> CONTENT            BLOB      16777216             16777216
> >> TITLE              VARCHAR       2048                 2048
> >> REPRURL            VARCHAR        767                  767
> >> FETCHINTERVAL      INTEGER         11                   32
> >> PREVFETCHTIME      BIGINT          20                   64
> >> INLINKS            BLOB      16777216             16777216
> >> PREVSIGNATURE      BLOB      16777216             16777216
> >> OUTLINKS           BLOB      16777216             16777216
> >> FETCHTIME          BIGINT          20                   64
> >> RETRIESSINCEFETCH  INTEGER         11                   32
> >> PROTOCOLSTATUS     BLOB      16777216             16777216
> >> SIGNATURE          BLOB      16777216             16777216
> >> METADATA           BLOB      16777216             16777216
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2013-01-03 21:06:04,"Lewis John Mcgibbney" <
> [email protected]> wrote:
> >>>Hi Rui,
> >>>
> >>>The gora-sql backend is not stable so please do not be surprised if
> things
> >>>do not work flawlessly.
> >>>
> >>>I would urge you to have a look at the gora-sql-mapping.xml file [0] and
> >>>check the respective field values for the columns you are attempting to
> map.
> >>>
> >>>This aside, I would use the following SQL Store implementations if I
> were
> >>>going to use this backend
> >>>
> >>>HSQLDB - 2.2.8
> >>>MySQL - 5.1.18
> >>>
> >>>Which stage (in your Nutch processes) does this Exception occur?
> >>>
> >>>Lewis
> >>>
> >>>[0]
> >>>
> http://svn.apache.org/repos/asf/nutch/branches/2.x/conf/gora-sql-mapping.xml
> >>>
> >>>On Thu, Jan 3, 2013 at 9:34 AM, 高睿 <[email protected]> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I can't run Nutch 2.1 with Mysql, then I tried Hsql, failed again. So,
> >>>> which database are you using for nutch 2.1. I spent too much time on
> this
> >>>> and can not make it work.
> >>>>
> >>>> 2013-01-03 16:12:06,812 WARN  mapred.FileOutputCommitter - Output
> path is
> >>>> null in cleanup
> >>>> 2013-01-03 16:12:06,835 WARN  mapred.LocalJobRunner - job_local_0008
> >>>> java.io.IOException: java.sql.BatchUpdateException: data exception:
> string
> >>>> data, right truncation
> >>>>         at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
> >>>>         at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
> >>>>         at
> >>>>
> org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
> >>>>         at
> >>>>
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
> >>>>         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
> >>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >>>>         at
> >>>>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> >>>> Caused by: java.sql.BatchUpdateException: data exception: string data,
> >>>> right truncation
> >>>>         at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown
> >>>> Source)
> >>>>         at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
> >>>>         ... 6 more
> >>>>
> >>>> Regards,
> >>>> Rui
> >>>>
> >>>
> >>>
> >>>
> >>>--
> >>>*Lewis*
> >>
> >>
> >>
>



-- 
*Lewis*

Reply via email to