Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
No, only the seed URLs get updated with that option. On Tue, Sep 26, 2023 at 10:09 AM Marisol Redondo < marisol.redondo.gar...@gmail.com> wrote: > Thanks a lot for the explanation, Karl, really useful. > > I will wait for your reply at the end of the week, but I thought that the > main reason

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Marisol Redondo
Thanks a lot for the explanation, Karl, really useful. I will wait for your reply at the end of the week, but I thought that the main reason for the option "Reset seeding" was for that, for reevaluating all pages, as a new fresh execution. On Tue, 26 Sept 2023 at 13:30, Karl Wright wrote: >

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
Okay, that is good to know. The hopcount assessment occurs when documents are added to the queue. Hopcounts are stored for each document in the hopcount table. So if you change a hopcount limit, it is quite possible that nothing will change unless documents that are at the previous hopcount limit

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Marisol Redondo
No, I haven't used this options, I have it configured as "Keep unreachable documents, for now", but it's also ignoring them because they were already kept?. With this option, when the unreachable document for now are converted to forever? The only solution I can think on is creating a new job

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
If you ever set "Ignore unreachable documents forever" for the job, you can't go back and stop ignoring them. The data that the job would need to have recorded for this is gone. The only way to get it back is if you can convince the ManifoldCF to recrawl all documents in the job. On Tue, Sep

Documents Out Of Scope and hop count

2023-09-26 Thread Marisol Redondo
Hi, I had a problem with document out of scope I change the Maximum hop count for type "redirect" in one of my job to 5, and saw that the job is not processing some pages because of that, so I removed the value to get them injecting into the output connector (Solr connector) After that, the same

R: web crawler https

2023-09-26 Thread Bisonti Mario
Thanks a lot Karl! I uploaded ssl certificate and flag on “always trust” and it works Mario Da: Karl Wright Inviato: lunedì 25 settembre 2023 20:41 A: user@manifoldcf.apache.org Oggetto: Re: web crawler https See this article: