Re: HtmlParser removal in 3.x

Tim Allison Wed, 26 Mar 2025 05:35:27 -0700

Let me know if that didn't fix it. Thank you, again, David!

On Wed, Mar 26, 2025 at 6:44 AM Tim Allison <[email protected]> wrote:


> Y. Will fix shortly. Thank you!
>
> On Tue, Mar 25, 2025 at 4:50 PM David Pilato <[email protected]> wrote:
>
>> Hey team
>>
>> The page
>> https://tika.apache.org/3.1.0/formats.html#HyperText_Markup_Language
>> mentions:
>>
>>
>> The output from the HtmlParser class is guaranteed to be well-formed and
>> valid XHTML, and various heuristics are used to prevent things like inline
>> scripts from cluttering the extracted text content.
>>
>>
>> But HtmlParser links to a non existing class:
>> https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/HtmlParser.html
>> Should it be
>> https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/JSoupParser.html
>> instead?
>>
>>
>>
>> David Pilato
>> [email protected]
>> 06 13 03 08 41
>>
>>

Re: HtmlParser removal in 3.x

Reply via email to