There has been discussion about that few months back and I am not aware of
the exact root cause behind it.
See
http://lucene.472066.n3.nabble.com/Nutch-2-1-different-batch-id-null-td4040592.html
http://lucene.472066.n3.nabble.com/Re-nutch-2-1-with-mysql-different-batch-id-null-td4058698.html

There is Jira to track the same:
https://issues.apache.org/jira/browse/NUTCH-1567



On Thu, Jun 13, 2013 at 2:11 PM, Weder Carlos Vieira <weder.vie...@gmail.com
> wrote:

> mhmmm got it...
>
> Tejas can you please explain to me why I put some URL inside urls/seed.txt
> and many pages inside that urls aren't parsed?
>
> Example:
> Skipping http://wiki.creativecommons.org/Integrate; different batch id
> (null)
> Skipping http://wiki.creativecommons.org/LRMI; different batch id (null)
> Skipping http://wiki.creativecommons.org/Marking; different batch id
> (null)
>
> This pages are example of many others pages that aren't parsed.
> Like that, there are many other pages that I wanted to be read and recorded
> in the database.
>
>
> Thanks again.
>
>
>
> On Thu, Jun 13, 2013 at 6:04 PM, Tejas Patil <tejas.patil...@gmail.com
> >wrote:
>
> > Those are all images which wont get parsed by Nutch.
> >
> >
> > On Thu, Jun 13, 2013 at 1:33 PM, Weder Carlos Vieira <
> > weder.vie...@gmail.com
> > > wrote:
> >
> > >
> > > I extracted 1 row of this urls returned...
> > >
> > > It attached in excel format.
> > >
> > >
> > >
> >
>

Reply via email to