No, only the seed URLs get updated with that option.
On Tue, Sep 26, 2023 at 10:09 AM Marisol Redondo < marisol.redondo.gar...@gmail.com> wrote: > Thanks a lot for the explanation, Karl, really useful. > > I will wait for your reply at the end of the week, but I thought that the > main reason for the option "Reset seeding" was for that, for reevaluating > all pages, as a new fresh execution. > > > On Tue, 26 Sept 2023 at 13:30, Karl Wright <daddy...@gmail.com> wrote: > >> Okay, that is good to know. >> The hopcount assessment occurs when documents are added to the queue. >> Hopcounts are stored for each document in the hopcount table. So if you >> change a hopcount limit, it is quite possible that nothing will change >> unless documents that are at the previous hopcount limit are re-evaluated. >> I believe there is no logic in ManifoldCF for that at this time, but I'd >> have to review the codebase to be certain of that. >> >> What that means is that you can't increase the hopcount limit and expect >> the next crawl to pick up the documents you excluded before with the >> hopcount mechanism. Only when the documents need to be rescanned for some >> other reason would that happen as it stands now. But I will get back to >> you after a review at the end of the week. >> >> Karl >> >> Karl >> >> >> On Tue, Sep 26, 2023 at 8:04 AM Marisol Redondo < >> marisol.redondo.gar...@gmail.com> wrote: >> >>> No, I haven't used this options, I have it configured as "Keep >>> unreachable documents, for now", but it's also ignoring them because they >>> were already kept?. With this option, when the unreachable document for now >>> are converted to forever? >>> >>> The only solution I can think on is creating a new job with the exact >>> same characteristics and run it. >>> >>> Regards and thanks >>> Marisol >>> >>> >>> >>> On Tue, 26 Sept 2023 at 12:35, Karl Wright <daddy...@gmail.com> wrote: >>> >>>> If you ever set "Ignore unreachable documents forever" for the job, you >>>> can't go back and stop ignoring them. The data that the job would need to >>>> have recorded for this is gone. The only way to get it back is if you can >>>> convince the ManifoldCF to recrawl all documents in the job. >>>> >>>> >>>> On Tue, Sep 26, 2023 at 4:51 AM Marisol Redondo < >>>> marisol.redondo.gar...@gmail.com> wrote: >>>> >>>>> >>>>> Hi, I had a problem with document out of scope >>>>> >>>>> I change the Maximum hop count for type "redirect" in one of my job to >>>>> 5, and saw that the job is not processing some pages because of that, so I >>>>> removed the value to get them injecting into the output connector (Solr >>>>> connector) >>>>> After that, the same pages are still out of scope like the limit has >>>>> been set to 1, and they are not indexed. >>>>> >>>>> I have tried to "Reset seeding" thinking that maybe the pages need to >>>>> be check again, but still having the same problem, I don't think the >>>>> problem is with the output, but I have also use the option "Re-index all >>>>> associated documents" and "Remove all associated records" with the same >>>>> result >>>>> I don't want to clear the history in the repository, that it's a >>>>> website connector, as I don't want to lost all the history. >>>>> >>>>> Is this a bug in Manifold? Is there any option to fix this issue? >>>>> >>>>> I'm using Manifold version 2.24. >>>>> >>>>> Thanks >>>>> Marisol >>>>> >>>>>