Benoit Tellier created JAMES-4062:
-------------------------------------
Summary: Experiment flexmark for HTML text extraction
Key: JAMES-4062
URL: https://issues.apache.org/jira/browse/JAMES-4062
Project: James Server
Issue Type: Improvement
Components: JMAP
Reporter: Benoit Tellier
Assignee: Antoine Duprat
JMAP code currently relies on a homegrown rendering code plugged onto an HTML
parser.
Though the code kind of works, it is not core code from ASF James and we
regularly miss some formating options and
https://issues.apache.org/jira/browse/JAMES-4061 is a good example of it.
An alternative could be to rely on a battle tested generally purposed library,
eg https://github.com/vsch/flexmark-java and flexmark-html2md-converter as
suggested privately by Wojtek.
Related code would likely handle all corner cases without us thinking about it.
Also we could offer a JVM option for switching between it and the current jsoup
implementation, which would stay the default (the time to experiment the
flexmark option)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]