sebbASF opened a new issue #513: Bug: cluster generator fails to parse 
non-ascii subject and sender
URL: https://github.com/apache/incubator-ponymail/issues/513
 
 
   The cluster generator uses fields of the msg assuming that they will be 
strings.
   
   However that is not the case if non-ascii characters have been used.
   
   In such cases, code such as msg.get('subject') will return an 
email.header.Header object.
   This causes code such as bytes(subject, encoding = 'ascii') to fail with
   
   TypeError: encoding without a string argument
   
   In turn, this causes the archiver to revert to a very basic fallback mid:
   
           mid = hashlib.sha224(str("%s-%s" % (lid, 
msg_metadata['archived-at'])).encode('utf-8')).hexdigest() + "@" + (lid if lid 
else "none")
   
   **Unless archived-at is defined, this will be constant for a given list id**
   
   This is relatively easy to fix; the generator should use the msg_metadata 
dict which
   the archiver has already set up.
   
   HOWEVER, to ensure that it's possible to regenerate the same Permalinks, any 
fix MUST be implemented as a new generator type, with a new syntax (i.e. change 
the 'r' prefix).
   
   There are probably some other changes that need to be made to the cluster 
generator.
   For example, Message-Id should be canonicalised.
   
   Note that the fallback mid cannot be changed, as that would affect all the 
generators.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to