My goal is : for any received mail,
- get the text body with html tags
- send to existing Rest WS to convert mail body to PDF/A
- archive the PDF and delete the mail.

I couldn't imagine that it would be such a pain !
I tried some parts of your code, but it seems like the tags are removed BEFORE 
entering in that parts. 

Who removes the tags ? camel ? any dependency ?
And why ?


24 nov. 2021 14:56:42 Daniel Langevin <daniel.lange...@shq.gouv.qc.ca>:

> Hi,
> 
> It's a little bit more complicated.
> 
> First, you have to know if the body mail is string or multipart or Nested
> Because
>    if body is string, the mail doesn’t contain HTML content !
>    if body is multi-part, it can contains NestedMessage ( mail message in 
> attachment )
> 
> I didn't  recommend working with HTML mail, it is very complicated.
> 
> We have been reading e-mail with Camel-mail (imaps) since 4 years now, and we 
> gave up HTML processing after 1 year.
> 
> We changed our approach, extracting the original message first and processed 
> it like a file attachment.
> Then we look if body is com.sun.mail.imap.IMAPNestedMessage or 
> java.lang.string or java.mail.internet.MimeMultipart
> and extract the TEXT of the body,because some times only have HTML without 
> alternative-text.
> 
> Here is a small part taken from our email processing, but enough to 
> understand how it works
> 
> <choice>
> <when><simple>${body} is "com.sun.mail.imap.IMAPNestedMessage"</simple> <!-- 
> seulement msg822 en pieceJointe -->
>        <setHeader headerName="zzBody">
>        <groovy>
>             def MimeUtility     = new javax.mail.internet.MimeUtility();
>             // result              = 
> request.body.getContent().getBodyPart(0).getContent()
>             result              = "( Impossible d'extraire le Corps du 
> message, voir pièce jointe msg822 )"
>        </groovy>
>        </setHeader>
>        <log message="Null BODY: "/>
>        <to uri="direct:sdiInsertBodyCourriel" />
>        <to uri="direct:sdiExtractMsg822" />
> </when>
> 
> <when><simple>${body} is "java.lang.String"</simple>
>        <setHeader headerName="zzBody">
>        <groovy>
>            def MimeUtility = new javax.mail.internet.MimeUtility();
>            def Jsoup       = new org.jsoup.Jsoup();
>            if ( 
> request.getOriginalMessage().getContentType().toLowerCase().contains('html')) 
> {
>               zresult = 
> MimeUtility.decodeText(request.getOriginalMessage().getContent());
>               result  = Jsoup.parse(zresult).wholeText(); // preserve CRLF
>            } else {
>                   result = 
> MimeUtility.decodeText(request.getOriginalMessage().getContent());
>                   }
>        </groovy>
>        </setHeader>
>        <log message="Txt BODY: "/>
>        <to uri="direct:sdiInsertBodyCourriel" />
> </when>
> 
> <when><simple>${body} is "javax.mail.internet.MimeMultipart"</simple>
> 
>        <log message="BODY: Multipart"/>
> 
>        <to uri="direct:sdiExtractPJ" />
> 
>        <to uri="direct:sdiExtractBodyMPart" />
>        <to uri="direct:sdiInsertBodyCourriel" />
>        <to uri="direct:sdiExtractMsg822" />
>        <setHeader headerName="zzSwitchlirePj">
>           <groovy>
>                 zPjTemp       = new File( request.headers['zzTEMPPJ'] )
>                 zzPj          = zPjTemp.listFiles();
>                 if ( zzPj.length &gt; 0 ) { result="OUI"}
>                      else { result="NON" }
>           </groovy>
>        </setHeader>
>        <log message="SwitchlirePj: ${header.zzSwitchlirePj} " />
>        <filter><simple>${header.zzSwitchlirePj} == "OUI" </simple>
>                <to uri="direct:sdiLirePjCourriel" />
>        </filter>
> </when>
> 
> <route id="rte.SDI.ExtractOriginal" streamCache="true">
> <description>
> ============================================================
> Permet d'ecrire le courriel Original complet dans un fichier
> ============================================================
> </description>
> <from uri="direct:sdiExtractOriginal"/>
> <setHeader headerName="zzoriginal">
> <groovy>
> def FileUtils = new org.apache.commons.io.FileUtils()
> zRootTemp   = new File( request.headers['zzTEMP'] )
> z822Temp    = new File( request.headers['zzTEMP822'] )
> zPjTemp     = new File( request.headers['zzTEMPPJ'] )
> FileUtils.forceMkdir(z822Temp);
> FileUtils.forceMkdir(zPjTemp);
> FileUtils.cleanDirectory(z822Temp);
> FileUtils.cleanDirectory(zPjTemp);
> result ="\n- Extraction du Mail Originale \n"
> zHeaders=""
> crlf="\r\n"
> 
> headerEnum = request.getOriginalMessage().getAllHeaders();
>     while (headerEnum.hasMoreElements()) {
>       header = headerEnum.nextElement();
>       name = header.getName();
>       <!-- Anonymise les information de securite, adresse IP des serveurs 
> Internes -->
>       value = header.getValue().replaceAll('10.100.','192.168.');
>       zHeaders = zHeaders + name +": " + value + crlf
>     }
>     mailfilename = request.headers['zzTEMP822']+"Courriel.eml"
>     mailoriginal = new File(mailfilename);
>     mailoriginal.write(zHeaders+crlf);
>     
> mailoriginal.append(request.getOriginalMessage().getContentStream().getText()+crlf);
> 
> result = result + mailfilename + "\n- Extraction du Mail Originale - Complété 
> \n"
> </groovy>
> </setHeader>
> <log message="${header.zzoriginal}" />
> </route>
> 
> 
> 
> 
> <route id="rte.SDI.ExtractBodyMPart" streamCache="true">
>     <from uri="direct:sdiExtractBodyMPart"/>
>      <log message="555 Extract BODY" />
>        <setHeader headerName="zzBody">
>        <groovy>
>           part            = request.body.getBodyPart(0)
>           //part            = 
> request.getOriginalMessage().getContent().getBodyPart(0)
>           def Jsoup       = new org.jsoup.Jsoup();
>           def wlist(org.jsoup.safety.Whitelist basic ) {
>               return new org.jsoup.safety.Whitelist()
>                      .addTags(
>                               "a", "b", "blockquote", "br", "cite", "code", 
> "dd", "dl", "dt", "em",
>                               "i", "li", "ol", "p", "pre", "q", "small", 
> "span", "strike", "strong", "sub",
>                               "sup", "u", "ul")
>                      .addAttributes("a", "href")
>                      .addAttributes("blockquote", "cite")
>                      .addAttributes("q", "cite")
>                      .addProtocols("a", "href", "ftp", "http", "https", 
> "mailto")
>                      .addProtocols("blockquote", "cite", "http", "https")
>                      .addProtocols("cite", "cite", "http", "https")
>               }
> 
>             if ( 
> part.getContent().getClass().equals(com.sun.mail.imap.IMAPNestedMessage)) {
>                  result = "( PAS DE CORPS DE MESSAGE / voir piece jointe ) ";
>             }
>             else if ( part.getContentType().toLowerCase().contains('html')) {
>                      zbody   = part.getContent();
>                      zbody   = Jsoup.clean(zbody,wlist()); // conserve 
> seulement certains tags de bases
>                      zbody   = zbody.replaceAll("&lt;a .*href=","")
>                      zbody   = zbody.replaceAll("&gt;&lt;/a&gt;","")
>                      result  = Jsoup.parse(zbody).wholeText(); // preserve 
> CRLF
>                     }
>             else if ( 
> part.getContent().getClass().equals(javax.mail.internet.MimeMultipart)) {
>                     result = part.getContent().getBodyPart(0).getContent();
>                     }
>             else {
>                  result = part.getContent();
>                  }
>        </groovy>
>        </setHeader>
> </route>
> 
> <route id="rte.SDI.ExtractMsg822" streamCache="true">
> <description>
> =======================================================================
> Permet d'extraire et d'écrire les parties messages/822 dans un fichier
> Ce(s) fichier(s) seront par la suite inscrit dans la BD comme PJnn.eml
> =======================================================================
> </description>
> <from uri="direct:sdiExtractMsg822"/>
> <log message="777 Extract MSG822 ?" />
> <setHeader headerName="zz822">
> <groovy>
>     zresult = 0
>     if ( request.body.getClass().equals(com.sun.mail.imap.IMAPNestedMessage)) 
> {
>        zHeaders=""
>        crlf="\r\n"
>        headerEnum = request.body.getAllHeaders();
>        while (headerEnum.hasMoreElements()) {
>          header = headerEnum.nextElement();
>          name = header.getName();
>          <!-- Anonymise les information de securite, adresse IP des serveurs 
> Internes -->
>          value = header.getValue().replaceAll('10.100.','192.168.');
>          zHeaders = zHeaders + name +": " + value + "\r\n"
>        }
>        mail822filename = request.headers['zzTEMP822']+"PJ_Courriel_1.eml"
>        mailoriginal = new File(mail822filename);
>        mailoriginal.write(zHeaders+crlf);
>        mailoriginal.append(request.body.getContentStream().getText()+crlf);
>        zresult = zresult + 1;
>     }  else {
>             mimeMultipart = request.body
>             partCount = mimeMultipart.getCount();
>             for (i = 0; i != mimeMultipart.getCount(); i++) {
>                 ii = i;
>                 bodyPart = mimeMultipart.getBodyPart(i);
>                 if (bodyPart.isMimeType("message/rfc822")) {
>                     zresult = zresult + 1;
>                     mail822filename = 
> request.headers['zzTEMP822']+"PJ_Courriel_" + zresult + ".eml"
>                     bodyPart.getContent().writeTo(new FileOutputStream(new 
> File(mail822filename)));
>                 }
>             }
>           //  if ( zresult &gt; 1){  zresult = zresult - 1; } // replace le 
> bon nombre de PJ extraite apres la LOOP
>        }
>    result = zresult
> </groovy>
> </setHeader>
> <log message="777 N822= ${header.zz822}" />
> <setHeader headerName="zzLoop"> <constant>0</constant></setHeader>
> <loop doWhile="true"><simple>${header.zzLoop} != ${header.zz822}</simple>
> <log message="Loop ? ${header.zzLoop} = ${header.zz822}" />
>     <setHeader 
> headerName="zzLoop"><simple>${header.zzLoop}++</simple></setHeader>
>     <setHeader headerName="zzPJ" ><groovy>new 
> File(request.headers['zzTEMP822']+"PJ_Courriel_"+request.headers['zzLoop']+".eml").bytes</groovy></setHeader>
>     <setHeader headerName="utlRecCourrielPjContentType" 
> ><constant>message/rfc822</constant></setHeader>
>     <setHeader headerName="utlNomPj" 
> ><simple>PJ_Courriel_${header.utlNoSeqCommunication}_${header.zzLoop}.eml</simple></setHeader>
>     <setHeader headerName="zzTypeCode"><constant>REGUL</constant></setHeader>
>     <to uri="direct:sdiInsertPjCourriel" />
> </loop>
> </route>
> 
> I hope it can help you.
> 
> 
> Regards
> 
> 
> 
> 
> Daniel Langevin
> 
> 
> -----Message d'origine-----
> De : Joël Guelluy <jguel...@skynet.be>
> Envoyé : 24 novembre 2021 06:23
> À : users@camel.apache.org
> Objet : Read HTML mails
> 
> Hello,
> 
> I am using camel-mail to poll a mailbox (HTML mails).
> I want to retrieve the HTML source of the mail, i didn't find how i can do 
> this...
> 
> I tried exchange.getIn().getBody() but it contains only text, tags are lost.
> Do i have to configure/declare something to keep it ?
> 
> Can you help me ?
> 
> Thanks you for informations...

Reply via email to