[ 
https://issues.apache.org/jira/browse/TIKA-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900088#comment-15900088
 ] 

Tim Allison commented on TIKA-1879:
-----------------------------------

For "from", I assumed a single sender (which isn't always the case with "on 
behalf of" and/or "via"), and I created separate fields for Exchange email 
formats, e.g.
"/o=ExchangeLabs/ou=Exchange Administrative Group 
(FYDIBOHF23SPDLT)/cn=Recipients/cn=polyspot1.onmicrosoft.com-50609-Some-One

was mapped to: 
message_from_o=ExchangeLabs,
message_from_ou=Exchange AdministrativeGroup (FY...)
message_from_cn=polyspot1....

However, this won't map neatly to handling the "to" fields.  One unsatisfactory 
option is to keep a parallel arrays of names, smtpemails and exchangeemails, 
with empty cells in the smtpemails when there is an exchange formatted email 
and vice versa.  A cleaner option would be to have a single pair of parallel 
arrays with name[] and email[], where email[] would include the literal email 
value, whether it is smtp or exchange; the user would then have to parse an 
Exchange email address if they wanted to differentiate _o, _ou and _cn.

[~mcaruanagalizia] and [~lfcnassif], any recommendations?

> Extract recipient information in MSG files with more granularity
> ----------------------------------------------------------------
>
>                 Key: TIKA-1879
>                 URL: https://issues.apache.org/jira/browse/TIKA-1879
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Tim Allison
>            Priority: Minor
>
> As proposed in the parent task, it might be nice to have a parallel array for 
> recipient name/recipient email for TO, CC and BCC.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to