Re: RE : Contribution to ManifoldCF webcrawler

2023-09-26 Thread Karl Wright
Looks good!
I will try to get this merged today.
Karl


On Mon, Sep 25, 2023 at 8:14 AM Karl Wright  wrote:

> Thanks.
> I will have a look at first opportunity.
> Karl
>
>
> On Mon, Sep 25, 2023 at 7:00 AM Emeric Bernet-Rollande <
> emeric.ber...@francelabs.com> wrote:
>
>> Hi,
>>
>> I opened a Pull Request, right here !
>> https://github.com/apache/manifoldcf/pull/149
>>
>> Regards,
>>
>> Emeric Bernet-Rollande
>>
>> France Labs – Your knowledge, now
>> Datafari Enterprise Search – Découvrez la version 5 / Discover our
>> version 5
>> www.datafari.com
>>
>> De : Furkan KAMACI
>> Envoyé le :lundi 25 septembre 2023 09:28
>> À : dev@manifoldcf.apache.org
>> Cc : olivier.tav...@francelabs.com; France Labs
>> Objet :Re: Contribution to ManifoldCF webcrawler
>>
>> Hi Emeric,
>>
>> First of all, thank you for your effort and suggestion. Do you have a Pull
>> Request for that improvement?
>>
>> Kind regards,
>> Furkan Kamaci
>>
>> On Mon, Sep 25, 2023 at 10:23 AM Emeric Bernet-Rollande <
>> emeric.ber...@francelabs.com> wrote:
>>
>> > Hi Karl and all !
>> >
>> >
>> >
>> > I’ve been working on the MCF webcrawler component for our Datafari
>> > project, and I made some developments that might interest the MCF
>> community.
>> >
>> >
>> >
>> > Currently if a website redirects the user with a code 301 or 302 and the
>> > « limit to seed is checked », the website (the one pointed by the
>> > redirection) won’t be indexed. We added an option  « Force the inclusion
>> > of redirections », which will override the previous checkbox if the
>> crawl
>> > encounters a redirection.
>> >
>> >
>> >
>> >
>> >
>> > Would you be interested in getting the patch to integrate it into
>> > ManifoldCF? The corresponding documentation can be found here:
>> >
>> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1886879745/Web+Connectors
>> >
>> >
>> >
>> > Regards,
>> >
>> >
>> >
>> > Emeric Bernet-Rollande
>> >
>> >
>> >
>> > *France Labs – Your knowledge, now*
>> >
>> > Datafari Enterprise Search – Découvrez la version 5 / Discover our
>> version
>> > 5
>> > www.datafari.com
>> >
>> >
>> >
>>
>>


Re: RE : Contribution to ManifoldCF webcrawler

2023-09-25 Thread Karl Wright
Thanks.
I will have a look at first opportunity.
Karl


On Mon, Sep 25, 2023 at 7:00 AM Emeric Bernet-Rollande <
emeric.ber...@francelabs.com> wrote:

> Hi,
>
> I opened a Pull Request, right here !
> https://github.com/apache/manifoldcf/pull/149
>
> Regards,
>
> Emeric Bernet-Rollande
>
> France Labs – Your knowledge, now
> Datafari Enterprise Search – Découvrez la version 5 / Discover our version
> 5
> www.datafari.com
>
> De : Furkan KAMACI
> Envoyé le :lundi 25 septembre 2023 09:28
> À : dev@manifoldcf.apache.org
> Cc : olivier.tav...@francelabs.com; France Labs
> Objet :Re: Contribution to ManifoldCF webcrawler
>
> Hi Emeric,
>
> First of all, thank you for your effort and suggestion. Do you have a Pull
> Request for that improvement?
>
> Kind regards,
> Furkan Kamaci
>
> On Mon, Sep 25, 2023 at 10:23 AM Emeric Bernet-Rollande <
> emeric.ber...@francelabs.com> wrote:
>
> > Hi Karl and all !
> >
> >
> >
> > I’ve been working on the MCF webcrawler component for our Datafari
> > project, and I made some developments that might interest the MCF
> community.
> >
> >
> >
> > Currently if a website redirects the user with a code 301 or 302 and the
> > « limit to seed is checked », the website (the one pointed by the
> > redirection) won’t be indexed. We added an option  « Force the inclusion
> > of redirections », which will override the previous checkbox if the crawl
> > encounters a redirection.
> >
> >
> >
> >
> >
> > Would you be interested in getting the patch to integrate it into
> > ManifoldCF? The corresponding documentation can be found here:
> >
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1886879745/Web+Connectors
> >
> >
> >
> > Regards,
> >
> >
> >
> > Emeric Bernet-Rollande
> >
> >
> >
> > *France Labs – Your knowledge, now*
> >
> > Datafari Enterprise Search – Découvrez la version 5 / Discover our
> version
> > 5
> > www.datafari.com
> >
> >
> >
>
>


RE : Contribution to ManifoldCF webcrawler

2023-09-25 Thread Emeric Bernet-Rollande
Hi,

I opened a Pull Request, right here ! 
https://github.com/apache/manifoldcf/pull/149

Regards,

Emeric Bernet-Rollande

France Labs – Your knowledge, now
Datafari Enterprise Search – Découvrez la version 5 / Discover our version 5
www.datafari.com

De : Furkan KAMACI
Envoyé le :lundi 25 septembre 2023 09:28
À : dev@manifoldcf.apache.org
Cc : olivier.tav...@francelabs.com; France Labs
Objet :Re: Contribution to ManifoldCF webcrawler

Hi Emeric,

First of all, thank you for your effort and suggestion. Do you have a Pull
Request for that improvement?

Kind regards,
Furkan Kamaci

On Mon, Sep 25, 2023 at 10:23 AM Emeric Bernet-Rollande <
emeric.ber...@francelabs.com> wrote:

> Hi Karl and all !
>
>
>
> I’ve been working on the MCF webcrawler component for our Datafari
> project, and I made some developments that might interest the MCF community.
>
>
>
> Currently if a website redirects the user with a code 301 or 302 and the
> « limit to seed is checked », the website (the one pointed by the
> redirection) won’t be indexed. We added an option  « Force the inclusion
> of redirections », which will override the previous checkbox if the crawl
> encounters a redirection.
>
>
>
>
>
> Would you be interested in getting the patch to integrate it into
> ManifoldCF? The corresponding documentation can be found here:
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1886879745/Web+Connectors
>
>
>
> Regards,
>
>
>
> Emeric Bernet-Rollande
>
>
>
> *France Labs – Your knowledge, now*
>
> Datafari Enterprise Search – Découvrez la version 5 / Discover our version
> 5
> www.datafari.com
>
>
>



Re: Contribution to ManifoldCF webcrawler

2023-09-25 Thread Furkan KAMACI
Hi Emeric,

First of all, thank you for your effort and suggestion. Do you have a Pull
Request for that improvement?

Kind regards,
Furkan Kamaci

On Mon, Sep 25, 2023 at 10:23 AM Emeric Bernet-Rollande <
emeric.ber...@francelabs.com> wrote:

> Hi Karl and all !
>
>
>
> I’ve been working on the MCF webcrawler component for our Datafari
> project, and I made some developments that might interest the MCF community.
>
>
>
> Currently if a website redirects the user with a code 301 or 302 and the
> « limit to seed is checked », the website (the one pointed by the
> redirection) won’t be indexed. We added an option  « Force the inclusion
> of redirections », which will override the previous checkbox if the crawl
> encounters a redirection.
>
>
>
>
>
> Would you be interested in getting the patch to integrate it into
> ManifoldCF? The corresponding documentation can be found here:
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1886879745/Web+Connectors
>
>
>
> Regards,
>
>
>
> Emeric Bernet-Rollande
>
>
>
> *France Labs – Your knowledge, now*
>
> Datafari Enterprise Search – Découvrez la version 5 / Discover our version
> 5
> www.datafari.com
>
>
>