Thanks a lot for the explanation, Karl, really useful. I will wait for your reply at the end of the week, but I thought that the main reason for the option "Reset seeding" was for that, for reevaluating all pages, as a new fresh execution.
On Tue, 26 Sept 2023 at 13:30, Karl Wright <daddy...@gmail.com> wrote: > Okay, that is good to know. > The hopcount assessment occurs when documents are added to the queue. > Hopcounts are stored for each document in the hopcount table. So if you > change a hopcount limit, it is quite possible that nothing will change > unless documents that are at the previous hopcount limit are re-evaluated. > I believe there is no logic in ManifoldCF for that at this time, but I'd > have to review the codebase to be certain of that. > > What that means is that you can't increase the hopcount limit and expect > the next crawl to pick up the documents you excluded before with the > hopcount mechanism. Only when the documents need to be rescanned for some > other reason would that happen as it stands now. But I will get back to > you after a review at the end of the week. > > Karl > > Karl > > > On Tue, Sep 26, 2023 at 8:04 AM Marisol Redondo < > marisol.redondo.gar...@gmail.com> wrote: > >> No, I haven't used this options, I have it configured as "Keep >> unreachable documents, for now", but it's also ignoring them because they >> were already kept?. With this option, when the unreachable document for now >> are converted to forever? >> >> The only solution I can think on is creating a new job with the exact >> same characteristics and run it. >> >> Regards and thanks >> Marisol >> >> >> >> On Tue, 26 Sept 2023 at 12:35, Karl Wright <daddy...@gmail.com> wrote: >> >>> If you ever set "Ignore unreachable documents forever" for the job, you >>> can't go back and stop ignoring them. The data that the job would need to >>> have recorded for this is gone. The only way to get it back is if you can >>> convince the ManifoldCF to recrawl all documents in the job. >>> >>> >>> On Tue, Sep 26, 2023 at 4:51 AM Marisol Redondo < >>> marisol.redondo.gar...@gmail.com> wrote: >>> >>>> >>>> Hi, I had a problem with document out of scope >>>> >>>> I change the Maximum hop count for type "redirect" in one of my job to >>>> 5, and saw that the job is not processing some pages because of that, so I >>>> removed the value to get them injecting into the output connector (Solr >>>> connector) >>>> After that, the same pages are still out of scope like the limit has >>>> been set to 1, and they are not indexed. >>>> >>>> I have tried to "Reset seeding" thinking that maybe the pages need to >>>> be check again, but still having the same problem, I don't think the >>>> problem is with the output, but I have also use the option "Re-index all >>>> associated documents" and "Remove all associated records" with the same >>>> result >>>> I don't want to clear the history in the repository, that it's a >>>> website connector, as I don't want to lost all the history. >>>> >>>> Is this a bug in Manifold? Is there any option to fix this issue? >>>> >>>> I'm using Manifold version 2.24. >>>> >>>> Thanks >>>> Marisol >>>> >>>>