No, only the seed URLs get updated with that option.
On Tue, Sep 26, 2023 at 10:09 AM Marisol Redondo <
marisol.redondo.gar...@gmail.com> wrote:
> Thanks a lot for the explanation, Karl, really useful.
>
> I will wait for your reply at the end of the week, but I thought that the
> main reason
Thanks a lot for the explanation, Karl, really useful.
I will wait for your reply at the end of the week, but I thought that the
main reason for the option "Reset seeding" was for that, for reevaluating
all pages, as a new fresh execution.
On Tue, 26 Sept 2023 at 13:30, Karl Wright wrote:
>
Okay, that is good to know.
The hopcount assessment occurs when documents are added to the queue.
Hopcounts are stored for each document in the hopcount table. So if you
change a hopcount limit, it is quite possible that nothing will change
unless documents that are at the previous hopcount limit
No, I haven't used this options, I have it configured as "Keep unreachable
documents, for now", but it's also ignoring them because they were already
kept?. With this option, when the unreachable document for now are
converted to forever?
The only solution I can think on is creating a new job
If you ever set "Ignore unreachable documents forever" for the job, you
can't go back and stop ignoring them. The data that the job would need to
have recorded for this is gone. The only way to get it back is if you can
convince the ManifoldCF to recrawl all documents in the job.
On Tue, Sep
Hi, I had a problem with document out of scope
I change the Maximum hop count for type "redirect" in one of my job to 5,
and saw that the job is not processing some pages because of that, so I
removed the value to get them injecting into the output connector (Solr
connector)
After that, the same
Thanks a lot Karl!
I uploaded ssl certificate and flag on “always trust” and it works
Mario
Da: Karl Wright
Inviato: lunedì 25 settembre 2023 20:41
A: user@manifoldcf.apache.org
Oggetto: Re: web crawler https
See this article: