Public bug reported:

The following script works fine on 16.04 LTS:

#!/usr/bin/python3

import magic
import os

dir = "/usr/share/ca-certificates/mozilla"

mime = magic.open(magic.MAGIC_MIME)
mime.load()

for root, dirnames, filenames in os.walk(dir):
    for f in filenames:
        fn = os.path.join(root, f)
        print("%s: %s" % (fn, mime.file(fn)))

Eg:
$ python3 /tmp/test.py
/usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: 
text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; 
charset=us-ascii
/usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; 
charset=us-ascii
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt:
 text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: 
text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: 
text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt:
 text/plain; charset=us-ascii
...

(notice the last filename before the ellipsis)

But on 17.04, this happens:

$ python3 /tmp/test.py
/usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: 
text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; 
charset=us-ascii
/usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; 
charset=us-ascii
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt:
 text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: 
text/plain; charset=us-ascii
/usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: 
text/plain; charset=us-ascii
Traceback (most recent call last):
  File "/home/ubuntu/test.py", line 15, in <module>
    print("%s: %s" % (fn, mime.file(fn)))
  File "/usr/lib/python3/dist-packages/magic.py", line 130, in file
    bi = bytes(filename, 'utf-8')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 
69: surrogates not allowed

I'm guessing this is a change in python3 that python3-magic hasn't
accounted for, but I'm not sure. Adding python3 task just in case.

** Affects: file (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: python3.5 (Ubuntu)
     Importance: Undecided
         Status: New

** Also affects: python3.5 (Ubuntu)
   Importance: Undecided
       Status: New

** Description changed:

  The following script works fine on 16.04 LTS:
  
  #!/usr/bin/python3
  
  import magic
- 
  import os
  
  dir = "/usr/share/ca-certificates/mozilla"
  
  mime = magic.open(magic.MAGIC_MIME)
  mime.load()
  
  for root, dirnames, filenames in os.walk(dir):
-     for f in filenames:
-         fn = os.path.join(root, f)
-         print("%s: %s" % (fn, mime.file(fn)))
- 
+     for f in filenames:
+         fn = os.path.join(root, f)
+         print("%s: %s" % (fn, mime.file(fn)))
  
  Eg:
  $ python3 /tmp/test.py
  /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: 
text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; 
charset=us-ascii
  /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; 
charset=us-ascii
  
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt:
 text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: 
text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: 
text/plain; charset=us-ascii
  
/usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt:
 text/plain; charset=us-ascii
  ...
  
  (notice the last filename before the ellipsis)
  
  But on 17.04, this happens:
  
  $ python3 /tmp/test.py
  /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: 
text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; 
charset=us-ascii
  /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; 
charset=us-ascii
  
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt:
 text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: 
text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: 
text/plain; charset=us-ascii
  Traceback (most recent call last):
-   File "/home/ubuntu/test.py", line 15, in <module>
-     print("%s: %s" % (fn, mime.file(fn)))
-   File "/usr/lib/python3/dist-packages/magic.py", line 130, in file
-     bi = bytes(filename, 'utf-8')
+   File "/home/ubuntu/test.py", line 15, in <module>
+     print("%s: %s" % (fn, mime.file(fn)))
+   File "/usr/lib/python3/dist-packages/magic.py", line 130, in file
+     bi = bytes(filename, 'utf-8')
  UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 
69: surrogates not allowed
  
  I'm guessing this is a change in python3 that python3-magic hasn't
  accounted for, but I'm not sure. Adding python3 task just in case.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to file in Ubuntu.
https://bugs.launchpad.net/bugs/1677244

Title:
  "UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in
  position 69: surrogates not allowed" with mime.file() on path from
  os.walk

Status in file package in Ubuntu:
  New
Status in python3.5 package in Ubuntu:
  New

Bug description:
  The following script works fine on 16.04 LTS:

  #!/usr/bin/python3

  import magic
  import os

  dir = "/usr/share/ca-certificates/mozilla"

  mime = magic.open(magic.MAGIC_MIME)
  mime.load()

  for root, dirnames, filenames in os.walk(dir):
      for f in filenames:
          fn = os.path.join(root, f)
          print("%s: %s" % (fn, mime.file(fn)))

  Eg:
  $ python3 /tmp/test.py
  /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: 
text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; 
charset=us-ascii
  /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; 
charset=us-ascii
  
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt:
 text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: 
text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: 
text/plain; charset=us-ascii
  
/usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt:
 text/plain; charset=us-ascii
  ...

  (notice the last filename before the ellipsis)

  But on 17.04, this happens:

  $ python3 /tmp/test.py
  /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: 
text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; 
charset=us-ascii
  /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; 
charset=us-ascii
  
/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt:
 text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: 
text/plain; charset=us-ascii
  /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: 
text/plain; charset=us-ascii
  Traceback (most recent call last):
    File "/home/ubuntu/test.py", line 15, in <module>
      print("%s: %s" % (fn, mime.file(fn)))
    File "/usr/lib/python3/dist-packages/magic.py", line 130, in file
      bi = bytes(filename, 'utf-8')
  UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 
69: surrogates not allowed

  I'm guessing this is a change in python3 that python3-magic hasn't
  accounted for, but I'm not sure. Adding python3 task just in case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/file/+bug/1677244/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to