The problem appears with UserTextFrame and, when the option --strict is on, it 
appears also with LyricsFrame and CommentFrame. I had a hard time to find the 
reason. Here are the explanations taking for example CommentFrame.
Actually, the error does not raise from encode() but decode() which seems to be 
called by sys.stdout.write (called by printMsg, function which I don't see the 
usefulness compared to print) in eyeD3, line 995:
    printMsg("%s: [Description: %s] [Lang: %s]\n%s" %\
                     (boldText("Comment"), cDesc, cLang,
                      cText.encode(ENCODING,"replace")));
with printMsg(s) = sys.stdout.write(s + '\n').

The problem is linked to cDesc. The strings cDesc and cText are set as Unicode 
strings in frames.py, line 1076:
    self.description = unicode(d, id3EncodingToString(self.encoding));
    self.comment = unicode(c, id3EncodingToString(self.encoding));
but then,
    if not strictID3():
        self.description = cleanNulls(self.description)
        self.comment = cleanNulls(self.comment)
with cleanNulls(s) = "/".join([x for x in s.split('\x00') if x]), which does 
not return a Unicode string. Therefore, with the option --strict, at the 
printing, cDesc is a Unicode string but cText.encode(ENCODING,"replace") is a 
byte string. A sample command showing the error is
    >>> print "%s %s" %(u'', (u'é').encode("utf-8","replace"))
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
ordinal not in range(128)
whereas
    >>> print "%s %s" %('', (u'é').encode("utf-8","replace"))
     é
    >>> (u'é').encode("utf-8","replace") # returns a byte string
    '\xc3\xa9'

In Python 2.x (maybe different in 3.x with the new str type), if there
is at least one Unicode string, the print formatting apparently tries to
convert all the byte strings, if any, to Unicode with decode() which by
default uses 'ascii' encoding, hence the UnicodeDecodeError.

I see two (explainable ;) ) ways out of the bug, either by modifying
cleanNulls(s) to return a Unicode string (maybe contrary to the purpose
of cleanNulls(s), I don't know), or by encoding cDesc at the printing
with cDesc.encode(ENCODING,"replace"), which the attached patch
accomplishes.

For UserTextFrame, the bug always appears because description is not
processed through cleanNulls() whatever --strict, which seems to be
another default compared to the behavior chosen for LyricsFrame and
CommentFrame.

** Patch added: "fixes the format conversion of the description tag in printed 
messages"
   
https://bugs.launchpad.net/ubuntu/+source/eyed3/+bug/507132/+attachment/2803977/+files/description-unicode-conversion.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/507132

Title:
  eyeD3 doesn't parse certain id3 tags

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/eyed3/+bug/507132/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to