R: web crawler https

2023-09-26 Thread Bisonti Mario
Thanks a lot Karl!
I uploaded ssl certificate and flag on “always trust” and it works

Mario


Da: Karl Wright 
Inviato: lunedì 25 settembre 2023 20:41
A: user@manifoldcf.apache.org
Oggetto: Re: web crawler https

See this article:

https://stackoverflow.com/questions/6784463/error-trustanchors-parameter-must-be-non-empty

ManifoldCF web crawler configuration allows you to drop certs into a local 
trust store for the connection.  You need to either do that (adding whatever 
certificate authority cert you think might be missing), or by checking the 
"trust https" checkbox.

You can generally debug what certs a site might need by trying to fetch a page 
with curl and using verbose debug mode.

Karl


On Mon, Sep 25, 2023 at 10:48 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi,
I would like to try indexing a Wordpress internal site.
I tried to configure Repository Web, Job with seeds but I always obtain:

WARN 2023-09-25T16:31:50,905 (Worker thread '4') - Service interruption 
reported for job 1695649924581 connection 'Wp': IO exception 
(javax.net.ssl.SSLException)reading header: Unexpected error: 
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
must be non-empty

How could I solve?
Thanks a lot
Mario


Re: web crawler https

2023-09-25 Thread Karl Wright
See this article:

https://stackoverflow.com/questions/6784463/error-trustanchors-parameter-must-be-non-empty

ManifoldCF web crawler configuration allows you to drop certs into a local
trust store for the connection.  You need to either do that (adding
whatever certificate authority cert you think might be missing), or by
checking the "trust https" checkbox.

You can generally debug what certs a site might need by trying to fetch a
page with curl and using verbose debug mode.

Karl


On Mon, Sep 25, 2023 at 10:48 AM Bisonti Mario 
wrote:

> Hi,
>
> I would like to try indexing a Wordpress internal site.
>
> I tried to configure Repository Web, Job with seeds but I always obtain:
>
>
>
> WARN 2023-09-25T16:31:50,905 (Worker thread '4') - Service interruption
> reported for job 1695649924581 connection 'Wp': IO exception
> (javax.net.ssl.SSLException)reading header: Unexpected error:
> java.security.InvalidAlgorithmParameterException: the trustAnchors
> parameter must be non-empty
>
>
>
> How could I solve?
>
> Thanks a lot
>
> Mario
>
>


web crawler https

2023-09-25 Thread Bisonti Mario
Hi,
I would like to try indexing a Wordpress internal site.
I tried to configure Repository Web, Job with seeds but I always obtain:

WARN 2023-09-25T16:31:50,905 (Worker thread '4') - Service interruption 
reported for job 1695649924581 connection 'Wp': IO exception 
(javax.net.ssl.SSLException)reading header: Unexpected error: 
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
must be non-empty

How could I solve?
Thanks a lot
Mario