-----Original message----- > From:Matthias Paul <[email protected]> > Sent: Wed 06-Jun-2012 11:49 > To: [email protected] > Subject: Re: Linkdb empty > > Both db.ignore.internal.links and db.ignore.external.links are true in my > case. > Since I crawl only one domain, I suppose setting > db.ignore.external.links to true is a good idea. > > So db.ignore.internal.links should be false?
yes > > From what I understand db.ignore.external.links is a setting for the > crawldb, while db.ignore.internal.links is a setting for the linkdb? almost correct. It is not used in the crawldb but in the parse job, which is input to the crawldb. > > > > On Wed, Jun 6, 2012 at 10:02 AM, Markus Jelsma > <[email protected]> wrote: > > > > -----Original message----- > >> From:Matthias Paul <[email protected]> > >> Sent: Wed 06-Jun-2012 09:47 > >> To: [email protected] > >> Subject: Linkdb empty > >> > >> Hi all, > > > > hi > > > >> > >> I noticed that my linkdb is always empty although I use the generated > >> segments from the last crawl for the generation of the linkdb. > > > > Check the db.ignore.* settings in your config. > > > >> Do I have to keep more segments? > >> As I use Solr for indexing, I only keep the segments from the last crawl. > > > > You can discard the segments if you never do a full reindex. > > > >> > >> Thanks > >> Matthias > >> >

