Re: HtmlParser removal in 3.x

Saurabh Kumar Tue, 25 Mar 2025 19:01:22 -0700

Please unsubscribe me from your mailing list

On Tue, 25 Mar 2025 at 21:50, David Pilato <[email protected]> wrote:


> Hey team
>
> The page
> https://tika.apache.org/3.1.0/formats.html#HyperText_Markup_Language
> mentions:
>
>
> The output from the HtmlParser class is guaranteed to be well-formed and
> valid XHTML, and various heuristics are used to prevent things like inline
> scripts from cluttering the extracted text content.
>
>
> But HtmlParser links to a non existing class:
> https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/HtmlParser.html
> Should it be
> https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/JSoupParser.html
> instead?
>
>
>
> David Pilato
> [email protected]
> 06 13 03 08 41
>
>

Re: HtmlParser removal in 3.x

Reply via email to