I agree with Doug. Base64 maps each byte of the source string to unique bytes in the destination string. Decoding is also a unique mapping.
If the encoded string is “translated” in some way by additional processes, canonical or otherwise, then all bets are off. If you disagree, please offer an example or additional details of how 2 base64 strings might be equivalent. Tex From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of J Decker via Unicode Sent: Friday, October 12, 2018 9:29 AM To: d...@ewellic.org Cc: Unicode Discussion Subject: Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false? On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode <unicode@unicode.org> wrote: J Decker wrote: >> How about the opposite direction: If m is base64 encoded to yield t >> and then t is base64 decoded to yield n, will it always be the case >> that m equals n? > > False. > Canonical translation may occur which the different base64 may be the > same sort of string... Base64 is a binary-to-text encoding. Neither encoding nor decoding should presume any special knowledge of the meaning of the binary data, or do anything extra based on that presumption. Converting Unicode text to and from base64 should not perform any sort of Unicode normalization, convert between UTFs, insert or remove BOMs, etc. This is like saying that converting a JPEG image to and from base64 should not resize or rescale the image, change its color depth, convert it to another graphic format, etc. So I'd say "true" to Roger's question. On the first side (X to base64) definitely true. But there is potential that text resulting from some decoded buffer is translated, resulting in a 'congruent' string that's not exactly the same... and the base64 will be different. Comparing some base64 string with some other base64 string shows a binary difference, but may be still the 'same' string. I touched on this a little bit in UTN #14, from the standpoint of trying to improve compression by normalizing the Unicode text first. -- Doug Ewell | Thornton, CO, US | ewellic.org