Hi Issei,

The setting of "Keep unreachable documents forever" basically means that no
hop count dependency information is kept around for any crawls done when
that setting is in place.  That means that when links change or documents
change the system does not know how to recompute the hopcount accurately.
This setting is appropriate if you want your crawl to be as fast as
possible and do not expect ever to use hop count filtering for the job in
question.

The "keep unreachable documents for now" means that enough information is
kept around that if you decided to put a hop count filter into place later,
it would still work properly.

Hope that helps.

Karl


On Thu, Nov 7, 2019 at 11:01 AM Issei Nishigata <duo.2...@gmail.com> wrote:

> Hi All,
>
>
> I use MCF2.12, and I have confused about specifications of HopFilters
> "Keep unreachable documents".
>
> I understand that the "Keep unrechable documents, for now" and "Keep
> unreacheable documents, forever" of HopFilter
> is an effective setting when specifying HopCount.
>
> For example, crawling all data with specifying the empty value on HopCount
> at first time, and the second time,
> putting 0 in the value of HopCount with "Keep unreachable documents, for
> now", only the first layer of the directory
> will be crawled and the second and deeper layers, which are not crawled,
> will not be deleted from the index.
>
> However, when actually processing as the above setting, document on second
> layer is deleted from index
> when processing second time and after that. It works same way when using
> "Keep unreacheable documents, forever".
>
> Is there anything wrong with my understanding? and Does anyone know about
> difference between these two settings,
> "Keep unrechable documents, for now" and "Keep unreacheable documents,
> forever"?
>
> If anyone of you knows about the specs of these settings, then it is very
> helpful to share your bits of advice.
> Any clue will be very appreciated.
>
>
> Sincerely,
> Issei Nishigata
>
>

Reply via email to