My goal is : for any received mail, - get the text body with html tags - send to existing Rest WS to convert mail body to PDF/A - archive the PDF and delete the mail.
I couldn't imagine that it would be such a pain ! I tried some parts of your code, but it seems like the tags are removed BEFORE entering in that parts. Who removes the tags ? camel ? any dependency ? And why ? 24 nov. 2021 14:56:42 Daniel Langevin <daniel.lange...@shq.gouv.qc.ca>: > Hi, > > It's a little bit more complicated. > > First, you have to know if the body mail is string or multipart or Nested > Because > if body is string, the mail doesn’t contain HTML content ! > if body is multi-part, it can contains NestedMessage ( mail message in > attachment ) > > I didn't recommend working with HTML mail, it is very complicated. > > We have been reading e-mail with Camel-mail (imaps) since 4 years now, and we > gave up HTML processing after 1 year. > > We changed our approach, extracting the original message first and processed > it like a file attachment. > Then we look if body is com.sun.mail.imap.IMAPNestedMessage or > java.lang.string or java.mail.internet.MimeMultipart > and extract the TEXT of the body,because some times only have HTML without > alternative-text. > > Here is a small part taken from our email processing, but enough to > understand how it works > > <choice> > <when><simple>${body} is "com.sun.mail.imap.IMAPNestedMessage"</simple> <!-- > seulement msg822 en pieceJointe --> > <setHeader headerName="zzBody"> > <groovy> > def MimeUtility = new javax.mail.internet.MimeUtility(); > // result = > request.body.getContent().getBodyPart(0).getContent() > result = "( Impossible d'extraire le Corps du > message, voir pièce jointe msg822 )" > </groovy> > </setHeader> > <log message="Null BODY: "/> > <to uri="direct:sdiInsertBodyCourriel" /> > <to uri="direct:sdiExtractMsg822" /> > </when> > > <when><simple>${body} is "java.lang.String"</simple> > <setHeader headerName="zzBody"> > <groovy> > def MimeUtility = new javax.mail.internet.MimeUtility(); > def Jsoup = new org.jsoup.Jsoup(); > if ( > request.getOriginalMessage().getContentType().toLowerCase().contains('html')) > { > zresult = > MimeUtility.decodeText(request.getOriginalMessage().getContent()); > result = Jsoup.parse(zresult).wholeText(); // preserve CRLF > } else { > result = > MimeUtility.decodeText(request.getOriginalMessage().getContent()); > } > </groovy> > </setHeader> > <log message="Txt BODY: "/> > <to uri="direct:sdiInsertBodyCourriel" /> > </when> > > <when><simple>${body} is "javax.mail.internet.MimeMultipart"</simple> > > <log message="BODY: Multipart"/> > > <to uri="direct:sdiExtractPJ" /> > > <to uri="direct:sdiExtractBodyMPart" /> > <to uri="direct:sdiInsertBodyCourriel" /> > <to uri="direct:sdiExtractMsg822" /> > <setHeader headerName="zzSwitchlirePj"> > <groovy> > zPjTemp = new File( request.headers['zzTEMPPJ'] ) > zzPj = zPjTemp.listFiles(); > if ( zzPj.length > 0 ) { result="OUI"} > else { result="NON" } > </groovy> > </setHeader> > <log message="SwitchlirePj: ${header.zzSwitchlirePj} " /> > <filter><simple>${header.zzSwitchlirePj} == "OUI" </simple> > <to uri="direct:sdiLirePjCourriel" /> > </filter> > </when> > > <route id="rte.SDI.ExtractOriginal" streamCache="true"> > <description> > ============================================================ > Permet d'ecrire le courriel Original complet dans un fichier > ============================================================ > </description> > <from uri="direct:sdiExtractOriginal"/> > <setHeader headerName="zzoriginal"> > <groovy> > def FileUtils = new org.apache.commons.io.FileUtils() > zRootTemp = new File( request.headers['zzTEMP'] ) > z822Temp = new File( request.headers['zzTEMP822'] ) > zPjTemp = new File( request.headers['zzTEMPPJ'] ) > FileUtils.forceMkdir(z822Temp); > FileUtils.forceMkdir(zPjTemp); > FileUtils.cleanDirectory(z822Temp); > FileUtils.cleanDirectory(zPjTemp); > result ="\n- Extraction du Mail Originale \n" > zHeaders="" > crlf="\r\n" > > headerEnum = request.getOriginalMessage().getAllHeaders(); > while (headerEnum.hasMoreElements()) { > header = headerEnum.nextElement(); > name = header.getName(); > <!-- Anonymise les information de securite, adresse IP des serveurs > Internes --> > value = header.getValue().replaceAll('10.100.','192.168.'); > zHeaders = zHeaders + name +": " + value + crlf > } > mailfilename = request.headers['zzTEMP822']+"Courriel.eml" > mailoriginal = new File(mailfilename); > mailoriginal.write(zHeaders+crlf); > > mailoriginal.append(request.getOriginalMessage().getContentStream().getText()+crlf); > > result = result + mailfilename + "\n- Extraction du Mail Originale - Complété > \n" > </groovy> > </setHeader> > <log message="${header.zzoriginal}" /> > </route> > > > > > <route id="rte.SDI.ExtractBodyMPart" streamCache="true"> > <from uri="direct:sdiExtractBodyMPart"/> > <log message="555 Extract BODY" /> > <setHeader headerName="zzBody"> > <groovy> > part = request.body.getBodyPart(0) > //part = > request.getOriginalMessage().getContent().getBodyPart(0) > def Jsoup = new org.jsoup.Jsoup(); > def wlist(org.jsoup.safety.Whitelist basic ) { > return new org.jsoup.safety.Whitelist() > .addTags( > "a", "b", "blockquote", "br", "cite", "code", > "dd", "dl", "dt", "em", > "i", "li", "ol", "p", "pre", "q", "small", > "span", "strike", "strong", "sub", > "sup", "u", "ul") > .addAttributes("a", "href") > .addAttributes("blockquote", "cite") > .addAttributes("q", "cite") > .addProtocols("a", "href", "ftp", "http", "https", > "mailto") > .addProtocols("blockquote", "cite", "http", "https") > .addProtocols("cite", "cite", "http", "https") > } > > if ( > part.getContent().getClass().equals(com.sun.mail.imap.IMAPNestedMessage)) { > result = "( PAS DE CORPS DE MESSAGE / voir piece jointe ) "; > } > else if ( part.getContentType().toLowerCase().contains('html')) { > zbody = part.getContent(); > zbody = Jsoup.clean(zbody,wlist()); // conserve > seulement certains tags de bases > zbody = zbody.replaceAll("<a .*href=","") > zbody = zbody.replaceAll("></a>","") > result = Jsoup.parse(zbody).wholeText(); // preserve > CRLF > } > else if ( > part.getContent().getClass().equals(javax.mail.internet.MimeMultipart)) { > result = part.getContent().getBodyPart(0).getContent(); > } > else { > result = part.getContent(); > } > </groovy> > </setHeader> > </route> > > <route id="rte.SDI.ExtractMsg822" streamCache="true"> > <description> > ======================================================================= > Permet d'extraire et d'écrire les parties messages/822 dans un fichier > Ce(s) fichier(s) seront par la suite inscrit dans la BD comme PJnn.eml > ======================================================================= > </description> > <from uri="direct:sdiExtractMsg822"/> > <log message="777 Extract MSG822 ?" /> > <setHeader headerName="zz822"> > <groovy> > zresult = 0 > if ( request.body.getClass().equals(com.sun.mail.imap.IMAPNestedMessage)) > { > zHeaders="" > crlf="\r\n" > headerEnum = request.body.getAllHeaders(); > while (headerEnum.hasMoreElements()) { > header = headerEnum.nextElement(); > name = header.getName(); > <!-- Anonymise les information de securite, adresse IP des serveurs > Internes --> > value = header.getValue().replaceAll('10.100.','192.168.'); > zHeaders = zHeaders + name +": " + value + "\r\n" > } > mail822filename = request.headers['zzTEMP822']+"PJ_Courriel_1.eml" > mailoriginal = new File(mail822filename); > mailoriginal.write(zHeaders+crlf); > mailoriginal.append(request.body.getContentStream().getText()+crlf); > zresult = zresult + 1; > } else { > mimeMultipart = request.body > partCount = mimeMultipart.getCount(); > for (i = 0; i != mimeMultipart.getCount(); i++) { > ii = i; > bodyPart = mimeMultipart.getBodyPart(i); > if (bodyPart.isMimeType("message/rfc822")) { > zresult = zresult + 1; > mail822filename = > request.headers['zzTEMP822']+"PJ_Courriel_" + zresult + ".eml" > bodyPart.getContent().writeTo(new FileOutputStream(new > File(mail822filename))); > } > } > // if ( zresult > 1){ zresult = zresult - 1; } // replace le > bon nombre de PJ extraite apres la LOOP > } > result = zresult > </groovy> > </setHeader> > <log message="777 N822= ${header.zz822}" /> > <setHeader headerName="zzLoop"> <constant>0</constant></setHeader> > <loop doWhile="true"><simple>${header.zzLoop} != ${header.zz822}</simple> > <log message="Loop ? ${header.zzLoop} = ${header.zz822}" /> > <setHeader > headerName="zzLoop"><simple>${header.zzLoop}++</simple></setHeader> > <setHeader headerName="zzPJ" ><groovy>new > File(request.headers['zzTEMP822']+"PJ_Courriel_"+request.headers['zzLoop']+".eml").bytes</groovy></setHeader> > <setHeader headerName="utlRecCourrielPjContentType" > ><constant>message/rfc822</constant></setHeader> > <setHeader headerName="utlNomPj" > ><simple>PJ_Courriel_${header.utlNoSeqCommunication}_${header.zzLoop}.eml</simple></setHeader> > <setHeader headerName="zzTypeCode"><constant>REGUL</constant></setHeader> > <to uri="direct:sdiInsertPjCourriel" /> > </loop> > </route> > > I hope it can help you. > > > Regards > > > > > Daniel Langevin > > > -----Message d'origine----- > De : Joël Guelluy <jguel...@skynet.be> > Envoyé : 24 novembre 2021 06:23 > À : users@camel.apache.org > Objet : Read HTML mails > > Hello, > > I am using camel-mail to poll a mailbox (HTML mails). > I want to retrieve the HTML source of the mail, i didn't find how i can do > this... > > I tried exchange.getIn().getBody() but it contains only text, tags are lost. > Do i have to configure/declare something to keep it ? > > Can you help me ? > > Thanks you for informations...