[
https://issues.apache.org/jira/browse/JAMES-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876717#comment-17876717
]
Benoit Tellier commented on JAMES-4061:
---------------------------------------
I first took a shot at JAMES-4061 but was quickly turn down by the complexity
of the task. Carrying other display context it not easy. I believe the overall
code would need deep changes.
I then decided to implement an easier version of JAMES-4061 which simply reders
inner text within blockquotes recursively then substiture \n with \n>. This
worked, quite well in practice but this could get abused easily by nesting many
blockquotes - I succeeded to get several minutes of compute time with 1 MB
payload. This way of doing is thus subject to potential use in DOS.
In this light, JAMES-4062 seems even more desirable: after all plain text
rendering is not a prime mission of James, and specialist libraries might do it
better than us!
> Html Text extractor needs to handle blockquote
> ----------------------------------------------
>
> Key: JAMES-4061
> URL: https://issues.apache.org/jira/browse/JAMES-4061
> Project: James Server
> Issue Type: Bug
> Components: JMAP
> Affects Versions: master
> Reporter: Benoit Tellier
> Assignee: Antoine Duprat
> Priority: Major
> Attachments: image-2024-08-22-14-54-37-915.png,
> image-2024-08-22-14-54-51-684.png, image-2024-08-22-14-55-01-317.png
>
>
> Following recent mailing list exchanges, Wojtek contacted me privatly to
> notice me about the bad idents of my inlined ansers.
> The exchange:
> https://www.mail-archive.com/[email protected]/msg74362.html
> Set up: I used Twake mail client throughout the discussion which produces
> html and relies on James server JMAP code for generating the text/plain part.
> Wojtek favors reading text plain when available.
> Full diagnostic is taken from a private conversation:
> h3. Diagnostic
> I bet this is a plain text projection of the email that screwed up. HTML
> version looks fine
> !image-2024-08-22-14-54-37-915.png!
> Which matched the output I see in my sent mails in Twake mail
> !image-2024-08-22-14-54-51-684.png!
> However indeed the text plain version is missing one level
> !image-2024-08-22-14-55-01-317.png!
> What we have
> >> Your initial concern
> > My initial answer
> Your answer
> My answer to your answer
> What we should have
> >>> Your initial concern
> >> My initial answer
> > Your answer
> My answer to your answer
> Where it gets annoying it is that our Webmail (
> https://github.com/apache/james-project ) generates an HTML output (WYSIWYG)
> and the backend then extract the text from the HTML in order to present a
> text/plain view of the message and the <blockquote> tags are currently
> ignored.
> The component converting HTML to text needs to account for these blockquotes,
> actually keep track of the count of blockquotes of the curent context and
> replace line breaks by the appropriate count of blockquotes
> <blockquote><p>abc</p><p>def<br/>ghi<p><blockquote><p>jkl</p><p>mno<br/></p></blockquote><p>pqr</p></blockquote><p>stu</p>
> Shall be replaced with
> > abc
> > def
> > ghi
> >> jkl
> >> mno
> > pqr
> stu
> The involved component is a JMAP utility of Apache James:
> org.apache.james.jmap.utils.JsoupHtmlTextExtractor
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]