On Tuesday 22 November 2011 12:13:51 Lewis John Mcgibbney wrote:
> Hi Markus,
>
> Just so I am understanding here, are the problems you've highlighted
> acceptable considering what we know about Nutch behaviour?
>
> We know the problem with parse-feed, parser time-outs, can you explainw hat
> y
Hi Markus,
Just so I am understanding here, are the problems you've highlighted
acceptable considering what we know about Nutch behaviour?
We know the problem with parse-feed, parser time-outs, can you explainw hat
you mean by bad pdf's?
Thank you
On Mon, Nov 21, 2011 at 8:01 PM, Markus Jelsma
I can't dump the DB right now since it's far too large for a single node but
from log output i can see that these records without signature were not
parsable with Tika such as RSS feeds, bad PDF 's or timed out parses.
> > On 15/11/2011 20:33, Markus Jelsma wrote:
> > > It's back again! Last tr
> On 15/11/2011 20:33, Markus Jelsma wrote:
> > It's back again! Last try if someone has a pointer for this.
> > Cheers
> >
> >> After some DB updates, they're gone! Anyone recognizes this phenomenon?
> >>
> >> On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote:
> >>> On Tuesday 08 Novembe
On 15/11/2011 20:33, Markus Jelsma wrote:
It's back again! Last try if someone has a pointer for this.
Cheers
After some DB updates, they're gone! Anyone recognizes this phenomenon?
On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote:
On Tuesday 08 November 2011 11:15:37 Markus Jelsma wr
It's back again! Last try if someone has a pointer for this.
Cheers
> After some DB updates, they're gone! Anyone recognizes this phenomenon?
>
> On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote:
> > On Tuesday 08 November 2011 11:15:37 Markus Jelsma wrote:
> > > Hi guys,
> > >
> > > I'v
After some DB updates, they're gone! Anyone recognizes this phenomenon?
On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote:
> On Tuesday 08 November 2011 11:15:37 Markus Jelsma wrote:
> > Hi guys,
> >
> > I've a M/R job selecting only DB_FETCHED and DB_NOTMODIFIED records and
> > their sign
On Tuesday 08 November 2011 11:15:37 Markus Jelsma wrote:
> Hi guys,
>
> I've a M/R job selecting only DB_FETCHED and DB_NOTMODIFIED records and
> their signatures. I had to add a sanity check on signature to avoid a NPE.
> I had the assumption any record with such DB_ status has to have a
> sig
Hi guys,
I've a M/R job selecting only DB_FETCHED and DB_NOTMODIFIED records and their
signatures. I had to add a sanity check on signature to avoid a NPE. I had the
assumption any record with such DB_ status has to have a signature, right?
Why does roughly 0.0001625% of my records exit without
9 matches
Mail list logo