Ryan Smith wrote: > Hello All, > > I am currently working on a small utility that finds any base64 > encoded strings in files and decodes them. I am having issue > understanding how the Base64 module actually works. The regular > expression that I am using correctly matches on the encoded strings. I > simply want to be able to convert the match of the encoded ascii > string to it's decoded ascii equivalent. For example the base64 > encoded ascii string 'UwB5AHMAdABlAG0ALgBkAGwAbAA=' will decode to > 'System.dll' if I use an online base64 decoder. However I get a > completely different output when trying to codify this using python > 3.6.5: > >>>>import base64 >>>>import binascii > >>>>test_str = 'UwB5AHMAdABlAG0ALgBkAGwAbAA=' >>>> base64.b64decode(test_str) > b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00' > >>>>temp = base64.b64decode(test_str) >>>>binascii.b2a_base64(temp) > b'UwB5AHMAdABlAG0ALgBkAGwAbAA=\n' > > I understand that when decoding and encoding you have to use bytes > objects but what I don't understand is why I can't get the proper > conversion of the original ascii string. Can someone please point me > in the right direction?
Look closely at the odd bytes in > b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00' or just do >>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[::2] b'System.dll' The even bytes are all NUL: >>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[1::2] b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' This means that your byte string already *is* the original string, encoded as UTF-16. You can convert it into a string with >>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'.decode("utf-16") 'System.dll' which will handle non-ascii characters correctly, too. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor