Re: Signature == null ?

2011-11-22 Thread Markus Jelsma
On Tuesday 22 November 2011 12:13:51 Lewis John Mcgibbney wrote: > Hi Markus, > > Just so I am understanding here, are the problems you've highlighted > acceptable considering what we know about Nutch behaviour? > > We know the problem with parse-feed, parser time-outs, can you explainw hat > y

Re: Signature == null ?

2011-11-22 Thread Lewis John Mcgibbney
Hi Markus, Just so I am understanding here, are the problems you've highlighted acceptable considering what we know about Nutch behaviour? We know the problem with parse-feed, parser time-outs, can you explainw hat you mean by bad pdf's? Thank you On Mon, Nov 21, 2011 at 8:01 PM, Markus Jelsma

Re: Signature == null ?

2011-11-21 Thread Markus Jelsma
I can't dump the DB right now since it's far too large for a single node but from log output i can see that these records without signature were not parsable with Tika such as RSS feeds, bad PDF 's or timed out parses. > > On 15/11/2011 20:33, Markus Jelsma wrote: > > > It's back again! Last tr

Re: Signature == null ?

2011-11-15 Thread Markus Jelsma
> On 15/11/2011 20:33, Markus Jelsma wrote: > > It's back again! Last try if someone has a pointer for this. > > Cheers > > > >> After some DB updates, they're gone! Anyone recognizes this phenomenon? > >> > >> On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote: > >>> On Tuesday 08 Novembe

Re: Signature == null ?

2011-11-15 Thread Andrzej Bialecki
On 15/11/2011 20:33, Markus Jelsma wrote: It's back again! Last try if someone has a pointer for this. Cheers After some DB updates, they're gone! Anyone recognizes this phenomenon? On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote: On Tuesday 08 November 2011 11:15:37 Markus Jelsma wr

Re: Signature == null ?

2011-11-15 Thread Markus Jelsma
It's back again! Last try if someone has a pointer for this. Cheers > After some DB updates, they're gone! Anyone recognizes this phenomenon? > > On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote: > > On Tuesday 08 November 2011 11:15:37 Markus Jelsma wrote: > > > Hi guys, > > > > > > I'v

Re: Signature == null ?

2011-11-10 Thread Markus Jelsma
After some DB updates, they're gone! Anyone recognizes this phenomenon? On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote: > On Tuesday 08 November 2011 11:15:37 Markus Jelsma wrote: > > Hi guys, > > > > I've a M/R job selecting only DB_FETCHED and DB_NOTMODIFIED records and > > their sign

Re: Signature == null ?

2011-11-08 Thread Markus Jelsma
On Tuesday 08 November 2011 11:15:37 Markus Jelsma wrote: > Hi guys, > > I've a M/R job selecting only DB_FETCHED and DB_NOTMODIFIED records and > their signatures. I had to add a sanity check on signature to avoid a NPE. > I had the assumption any record with such DB_ status has to have a > sig

Signature == null ?

2011-11-08 Thread Markus Jelsma
Hi guys, I've a M/R job selecting only DB_FETCHED and DB_NOTMODIFIED records and their signatures. I had to add a sanity check on signature to avoid a NPE. I had the assumption any record with such DB_ status has to have a signature, right? Why does roughly 0.0001625% of my records exit without