Re: RE : Contribution to ManifoldCF webcrawler
Looks good! I will try to get this merged today. Karl On Mon, Sep 25, 2023 at 8:14 AM Karl Wright wrote: > Thanks. > I will have a look at first opportunity. > Karl > > > On Mon, Sep 25, 2023 at 7:00 AM Emeric Bernet-Rollande < > emeric.ber...@francelabs.com> wrote: > >> Hi, >> >> I opened a Pull Request, right here ! >> https://github.com/apache/manifoldcf/pull/149 >> >> Regards, >> >> Emeric Bernet-Rollande >> >> France Labs – Your knowledge, now >> Datafari Enterprise Search – Découvrez la version 5 / Discover our >> version 5 >> www.datafari.com >> >> De : Furkan KAMACI >> Envoyé le :lundi 25 septembre 2023 09:28 >> À : dev@manifoldcf.apache.org >> Cc : olivier.tav...@francelabs.com; France Labs >> Objet :Re: Contribution to ManifoldCF webcrawler >> >> Hi Emeric, >> >> First of all, thank you for your effort and suggestion. Do you have a Pull >> Request for that improvement? >> >> Kind regards, >> Furkan Kamaci >> >> On Mon, Sep 25, 2023 at 10:23 AM Emeric Bernet-Rollande < >> emeric.ber...@francelabs.com> wrote: >> >> > Hi Karl and all ! >> > >> > >> > >> > I’ve been working on the MCF webcrawler component for our Datafari >> > project, and I made some developments that might interest the MCF >> community. >> > >> > >> > >> > Currently if a website redirects the user with a code 301 or 302 and the >> > « limit to seed is checked », the website (the one pointed by the >> > redirection) won’t be indexed. We added an option « Force the inclusion >> > of redirections », which will override the previous checkbox if the >> crawl >> > encounters a redirection. >> > >> > >> > >> > >> > >> > Would you be interested in getting the patch to integrate it into >> > ManifoldCF? The corresponding documentation can be found here: >> > >> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1886879745/Web+Connectors >> > >> > >> > >> > Regards, >> > >> > >> > >> > Emeric Bernet-Rollande >> > >> > >> > >> > *France Labs – Your knowledge, now* >> > >> > Datafari Enterprise Search – Découvrez la version 5 / Discover our >> version >> > 5 >> > www.datafari.com >> > >> > >> > >> >>
Re: RE : Contribution to ManifoldCF webcrawler
Thanks. I will have a look at first opportunity. Karl On Mon, Sep 25, 2023 at 7:00 AM Emeric Bernet-Rollande < emeric.ber...@francelabs.com> wrote: > Hi, > > I opened a Pull Request, right here ! > https://github.com/apache/manifoldcf/pull/149 > > Regards, > > Emeric Bernet-Rollande > > France Labs – Your knowledge, now > Datafari Enterprise Search – Découvrez la version 5 / Discover our version > 5 > www.datafari.com > > De : Furkan KAMACI > Envoyé le :lundi 25 septembre 2023 09:28 > À : dev@manifoldcf.apache.org > Cc : olivier.tav...@francelabs.com; France Labs > Objet :Re: Contribution to ManifoldCF webcrawler > > Hi Emeric, > > First of all, thank you for your effort and suggestion. Do you have a Pull > Request for that improvement? > > Kind regards, > Furkan Kamaci > > On Mon, Sep 25, 2023 at 10:23 AM Emeric Bernet-Rollande < > emeric.ber...@francelabs.com> wrote: > > > Hi Karl and all ! > > > > > > > > I’ve been working on the MCF webcrawler component for our Datafari > > project, and I made some developments that might interest the MCF > community. > > > > > > > > Currently if a website redirects the user with a code 301 or 302 and the > > « limit to seed is checked », the website (the one pointed by the > > redirection) won’t be indexed. We added an option « Force the inclusion > > of redirections », which will override the previous checkbox if the crawl > > encounters a redirection. > > > > > > > > > > > > Would you be interested in getting the patch to integrate it into > > ManifoldCF? The corresponding documentation can be found here: > > > https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1886879745/Web+Connectors > > > > > > > > Regards, > > > > > > > > Emeric Bernet-Rollande > > > > > > > > *France Labs – Your knowledge, now* > > > > Datafari Enterprise Search – Découvrez la version 5 / Discover our > version > > 5 > > www.datafari.com > > > > > > > >
RE : Contribution to ManifoldCF webcrawler
Hi, I opened a Pull Request, right here ! https://github.com/apache/manifoldcf/pull/149 Regards, Emeric Bernet-Rollande France Labs – Your knowledge, now Datafari Enterprise Search – Découvrez la version 5 / Discover our version 5 www.datafari.com De : Furkan KAMACI Envoyé le :lundi 25 septembre 2023 09:28 À : dev@manifoldcf.apache.org Cc : olivier.tav...@francelabs.com; France Labs Objet :Re: Contribution to ManifoldCF webcrawler Hi Emeric, First of all, thank you for your effort and suggestion. Do you have a Pull Request for that improvement? Kind regards, Furkan Kamaci On Mon, Sep 25, 2023 at 10:23 AM Emeric Bernet-Rollande < emeric.ber...@francelabs.com> wrote: > Hi Karl and all ! > > > > I’ve been working on the MCF webcrawler component for our Datafari > project, and I made some developments that might interest the MCF community. > > > > Currently if a website redirects the user with a code 301 or 302 and the > « limit to seed is checked », the website (the one pointed by the > redirection) won’t be indexed. We added an option « Force the inclusion > of redirections », which will override the previous checkbox if the crawl > encounters a redirection. > > > > > > Would you be interested in getting the patch to integrate it into > ManifoldCF? The corresponding documentation can be found here: > https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1886879745/Web+Connectors > > > > Regards, > > > > Emeric Bernet-Rollande > > > > *France Labs – Your knowledge, now* > > Datafari Enterprise Search – Découvrez la version 5 / Discover our version > 5 > www.datafari.com > > >
Re: Contribution to ManifoldCF webcrawler
Hi Emeric, First of all, thank you for your effort and suggestion. Do you have a Pull Request for that improvement? Kind regards, Furkan Kamaci On Mon, Sep 25, 2023 at 10:23 AM Emeric Bernet-Rollande < emeric.ber...@francelabs.com> wrote: > Hi Karl and all ! > > > > I’ve been working on the MCF webcrawler component for our Datafari > project, and I made some developments that might interest the MCF community. > > > > Currently if a website redirects the user with a code 301 or 302 and the > « limit to seed is checked », the website (the one pointed by the > redirection) won’t be indexed. We added an option « Force the inclusion > of redirections », which will override the previous checkbox if the crawl > encounters a redirection. > > > > > > Would you be interested in getting the patch to integrate it into > ManifoldCF? The corresponding documentation can be found here: > https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1886879745/Web+Connectors > > > > Regards, > > > > Emeric Bernet-Rollande > > > > *France Labs – Your knowledge, now* > > Datafari Enterprise Search – Découvrez la version 5 / Discover our version > 5 > www.datafari.com > > >