No, only the seed URLs get updated with that option.

On Tue, Sep 26, 2023 at 10:09 AM Marisol Redondo <
marisol.redondo.gar...@gmail.com> wrote:

> Thanks a lot for the explanation, Karl, really useful.
>
> I will wait for your reply at the end of the week, but I thought that the
> main reason for the option "Reset seeding" was for that, for reevaluating
> all pages, as a new fresh execution.
>
>
> On Tue, 26 Sept 2023 at 13:30, Karl Wright <daddy...@gmail.com> wrote:
>
>> Okay, that is good to know.
>> The hopcount assessment occurs when documents are added to the queue.
>> Hopcounts are stored for each document in the hopcount table.  So if you
>> change a hopcount limit, it is quite possible that nothing will change
>> unless documents that are at the previous hopcount limit are re-evaluated.
>> I believe there is no logic in ManifoldCF for that at this time, but I'd
>> have to review the codebase to be certain of that.
>>
>> What that means is that you can't increase the hopcount limit and expect
>> the next crawl to pick up the documents you excluded before with the
>> hopcount mechanism.  Only when the documents need to be rescanned for some
>> other reason would that happen as it stands now.  But I will get back to
>> you after a review at the end of the week.
>>
>> Karl
>>
>> Karl
>>
>>
>> On Tue, Sep 26, 2023 at 8:04 AM Marisol Redondo <
>> marisol.redondo.gar...@gmail.com> wrote:
>>
>>> No, I haven't used this options, I have it configured as "Keep
>>> unreachable documents, for now", but it's also ignoring them because they
>>> were already kept?. With this option, when the unreachable document for now
>>> are converted to forever?
>>>
>>> The only solution I can think on is creating a new job with the exact
>>> same characteristics and run it.
>>>
>>> Regards and thanks
>>>    Marisol
>>>
>>>
>>>
>>> On Tue, 26 Sept 2023 at 12:35, Karl Wright <daddy...@gmail.com> wrote:
>>>
>>>> If you ever set "Ignore unreachable documents forever" for the job, you
>>>> can't go back and stop ignoring them.  The data that the job would need to
>>>> have recorded for this is gone.  The only way to get it back is if you can
>>>> convince the ManifoldCF to recrawl all documents in the job.
>>>>
>>>>
>>>> On Tue, Sep 26, 2023 at 4:51 AM Marisol Redondo <
>>>> marisol.redondo.gar...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hi, I had a problem with document out of scope
>>>>>
>>>>> I change the Maximum hop count for type "redirect" in one of my job to
>>>>> 5, and saw that the job is not processing some pages because of that, so I
>>>>> removed the value to get them injecting into the output connector (Solr
>>>>> connector)
>>>>> After that, the same pages are still out of scope like the limit has
>>>>> been set to 1, and they are not indexed.
>>>>>
>>>>> I have tried to "Reset seeding" thinking that maybe the pages need to
>>>>> be check again, but still having the same problem, I don't think the
>>>>> problem is with the output, but I have also use the option "Re-index all
>>>>> associated documents" and "Remove all associated records" with the same
>>>>> result
>>>>> I don't want to clear the history in the repository, that it's a
>>>>> website connector, as I don't want to lost all the history.
>>>>>
>>>>> Is this a bug in Manifold? Is there any option to fix this issue?
>>>>>
>>>>> I'm using Manifold version 2.24.
>>>>>
>>>>> Thanks
>>>>>     Marisol
>>>>>
>>>>>

Reply via email to