Hi Peter, Thank you for the explanation! I have been banging my head around this for almost two days. I'm still getting familiar with all of the different encodings at play. For example the way I currently understand things is that python supports unicode which ultimately defaults to being encoded in UTF-8. Hence I'm guessing is the reason for converting strings to a bytes object in the first place. Again thank you for the assistance!
Ryan On Thu, Sep 13, 2018 at 2:57 AM, Peter Otten <__pete...@web.de> wrote: > Ryan Smith wrote: > >> Hello All, >> >> I am currently working on a small utility that finds any base64 >> encoded strings in files and decodes them. I am having issue >> understanding how the Base64 module actually works. The regular >> expression that I am using correctly matches on the encoded strings. I >> simply want to be able to convert the match of the encoded ascii >> string to it's decoded ascii equivalent. For example the base64 >> encoded ascii string 'UwB5AHMAdABlAG0ALgBkAGwAbAA=' will decode to >> 'System.dll' if I use an online base64 decoder. However I get a >> completely different output when trying to codify this using python >> 3.6.5: >> >>>>>import base64 >>>>>import binascii >> >>>>>test_str = 'UwB5AHMAdABlAG0ALgBkAGwAbAA=' >>>>> base64.b64decode(test_str) >> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00' >> >>>>>temp = base64.b64decode(test_str) >>>>>binascii.b2a_base64(temp) >> b'UwB5AHMAdABlAG0ALgBkAGwAbAA=\n' >> >> I understand that when decoding and encoding you have to use bytes >> objects but what I don't understand is why I can't get the proper >> conversion of the original ascii string. Can someone please point me >> in the right direction? > > Look closely at the odd bytes in > >> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00' > > or just do > >>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[::2] > b'System.dll' > > The even bytes are all NUL: > >>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[1::2] > b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' > > This means that your byte string already *is* the original string, encoded > as UTF-16. You can convert it into a string with > >>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'.decode("utf-16") > 'System.dll' > > which will handle non-ascii characters correctly, too. > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor