Please unsubscribe me from your mailing list On Tue, 25 Mar 2025 at 21:50, David Pilato <[email protected]> wrote:
> Hey team > > The page > https://tika.apache.org/3.1.0/formats.html#HyperText_Markup_Language > mentions: > > > The output from the HtmlParser class is guaranteed to be well-formed and > valid XHTML, and various heuristics are used to prevent things like inline > scripts from cluttering the extracted text content. > > > But HtmlParser links to a non existing class: > https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/HtmlParser.html > Should it be > https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/JSoupParser.html > instead? > > > > David Pilato > [email protected] > 06 13 03 08 41 > >
