Thanks a lot for the explanation, Karl, really useful.

I will wait for your reply at the end of the week, but I thought that the
main reason for the option "Reset seeding" was for that, for reevaluating
all pages, as a new fresh execution.


On Tue, 26 Sept 2023 at 13:30, Karl Wright <daddy...@gmail.com> wrote:

> Okay, that is good to know.
> The hopcount assessment occurs when documents are added to the queue.
> Hopcounts are stored for each document in the hopcount table.  So if you
> change a hopcount limit, it is quite possible that nothing will change
> unless documents that are at the previous hopcount limit are re-evaluated.
> I believe there is no logic in ManifoldCF for that at this time, but I'd
> have to review the codebase to be certain of that.
>
> What that means is that you can't increase the hopcount limit and expect
> the next crawl to pick up the documents you excluded before with the
> hopcount mechanism.  Only when the documents need to be rescanned for some
> other reason would that happen as it stands now.  But I will get back to
> you after a review at the end of the week.
>
> Karl
>
> Karl
>
>
> On Tue, Sep 26, 2023 at 8:04 AM Marisol Redondo <
> marisol.redondo.gar...@gmail.com> wrote:
>
>> No, I haven't used this options, I have it configured as "Keep
>> unreachable documents, for now", but it's also ignoring them because they
>> were already kept?. With this option, when the unreachable document for now
>> are converted to forever?
>>
>> The only solution I can think on is creating a new job with the exact
>> same characteristics and run it.
>>
>> Regards and thanks
>>    Marisol
>>
>>
>>
>> On Tue, 26 Sept 2023 at 12:35, Karl Wright <daddy...@gmail.com> wrote:
>>
>>> If you ever set "Ignore unreachable documents forever" for the job, you
>>> can't go back and stop ignoring them.  The data that the job would need to
>>> have recorded for this is gone.  The only way to get it back is if you can
>>> convince the ManifoldCF to recrawl all documents in the job.
>>>
>>>
>>> On Tue, Sep 26, 2023 at 4:51 AM Marisol Redondo <
>>> marisol.redondo.gar...@gmail.com> wrote:
>>>
>>>>
>>>> Hi, I had a problem with document out of scope
>>>>
>>>> I change the Maximum hop count for type "redirect" in one of my job to
>>>> 5, and saw that the job is not processing some pages because of that, so I
>>>> removed the value to get them injecting into the output connector (Solr
>>>> connector)
>>>> After that, the same pages are still out of scope like the limit has
>>>> been set to 1, and they are not indexed.
>>>>
>>>> I have tried to "Reset seeding" thinking that maybe the pages need to
>>>> be check again, but still having the same problem, I don't think the
>>>> problem is with the output, but I have also use the option "Re-index all
>>>> associated documents" and "Remove all associated records" with the same
>>>> result
>>>> I don't want to clear the history in the repository, that it's a
>>>> website connector, as I don't want to lost all the history.
>>>>
>>>> Is this a bug in Manifold? Is there any option to fix this issue?
>>>>
>>>> I'm using Manifold version 2.24.
>>>>
>>>> Thanks
>>>>     Marisol
>>>>
>>>>

Reply via email to