Hi Olivier,
The HTML parser built into MCF is quite resilient against badly formed
HTML, but there are limits. Characters like "<" and ">" are used to denote
tags and thus they confuse the parser when they are present in unescaped
form. It may be possible, with a fair bit of work, to handle
Hi Karl,
Thanks for your answer.
Could you detail your answer please ? Just to better understand : you mean that
there is no chance that special characters could be escaped in the MCF code in
this case ie the website needs to escape itself the special characters
otherwise the extraction will
Hi Olivier,
You can create a ticket but I don't have a good solution for you in any
case.
Karl
On Thu, Nov 15, 2018 at 6:53 AM Olivier Tavard <
olivier.tav...@francelabs.com> wrote:
> Hi Karl,
>
> Do you think that I need to create a Jira issue relative to this bug ie
> that the links
(1) I increased the retries to go at least 10 minutes.
(2) I handled the 503 response explicitly, with the same logic.
See: https://issues.apache.org/jira/browse/CONNECTORS-1556
Karl
On Thu, Nov 15, 2018 at 3:35 AM Bisonti Mario
wrote:
> Yes, Karl.
>
>
>
> Is it possible to apply the same
Hi Mario,
Here's the code:
>>
try {
//System.out.println("About to do a content PUT");
response = this.httpClient.execute(tikaHost, httpPut);
//System.out.println("... content PUT succeeded");
} catch (IOException e) {