Hi Michael,
Yes i testing this with Debug Mode and tested one more scenario.
Whenever Seed URL is something like this:-
https://www.abc.com/societybusiness/entrepreneurship/?lang=en
<https://www.rug.nl/society-business/centre-for-entrepreneurship/?lang=en>.,
Our web connector.Java code is return Null in this function, when m.find()
is executed. hence giving DocumentIdenitifer null and thus Iilegal seed URL
error
/** Check if the document identifier is legal.
*/
public boolean isDocumentLegal(String url)
{
// First, verify that the url matches one of the patterns in the
include list.
int i = 0;
while (i < includePatterns.size())
{
Pattern p = includePatterns.get(i);
Matcher m = p.matcher(url);
if (m.find())
break;
i++;
Whereas when the Seed method is something like this :-
https://www.abc.com/societybusiness/entrepreneurship/ , this code is
getting passed with out fail.
Can anybody make me understand why the same code is behaving differently?
Thanks
Ritika
}
On Tue, May 5, 2020 at 6:09 PM Michael Cizmar <[email protected]>
wrote:
> Hi Ritika,
>
>
>
> There are several reasons that you could get that. Have you started
> manifoldcf in debug mode? If so, what’s the output just before that
> statement in the logs?
>
>
>
> --
>
> Michael Cizmar
>
>
>
> *From: *ritika jain <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Tuesday, May 5, 2020 at 4:34 AM
> *To: *"[email protected]" <[email protected]>
> *Subject: *Illegal Seed URL
>
>
>
> Hi All,
>
>
>
> I am using Manifoldcf 2.14 Repository as Web crawler and Output as Elastic
> Search. I have mentioned a seed URL which is valid as it is opening
> successfully in browser.
>
> Say URl is https://www.abc.com/societybusiness/entrepreneurship/?lang=en
> <https://www.rug.nl/society-business/centre-for-entrepreneurship/?lang=en>
> .
>
>
>
> Which is having ? query string in URL.
>
> I am doing anything wrong in this
>
>
>
> Thanks
>
> Ritika
>
>
>
>
>