I also think the reverse is also true ! Decoding a Base64 entity does not warranty it will return valid text in any known encoding. So Unicode normalization of the output cannot apply.
Even if it represents text, nothing indicates that the result will be encoded with some Unicode encoding form (unless this is tagged separately, like in MIME). If you use Base64 for decoding MIME contents (e.g. for emails), the Base-64 decoding itself will not transform the encoding, but then the email parser will have to ensure that the text encoding is valid, at which time it will have to transform it (possibly replace some invalid sequences or truncate it), and then only it may apply normalization to help render that text. But these transforms are part of the MIME application and independant of whever you used Base-64 or any another binary encoding or transport syntax. In other words: "If m is not equal to m', then t will not equal t'" is reversible, but nothing indicates that m or m' Base64-decoded are texts, they are just opaque binary objects which are still equal in value like their t or t' Base64-encodings. Note: some Base64 envelope formats (like MIME) allow multiple representations t and t' from the same message m, by adding paddings or transport syntaxes like line-splitting (with varaible length). Base64 alone does not allow that variation (it normally uses a static alphabet), but there are variants that accept decoding extended alphabets as binary equivalent. So you may have two MIME-encoded texts that have different encodings (with Base64 or Quopted-Printable, with variable line lengths) but that represent the same source binary object, and decoding these different encoded messages will yeld the same binary object: this does not depend on Base64 but on the permissivity/flexibility of decoders for these envelope formats (using **extensions** of Base64 specific to the envelope format). Le ven. 12 oct. 2018 à 18:27, Doug Ewell via Unicode <unicode@unicode.org> a écrit : > J Decker wrote: > > >> How about the opposite direction: If m is base64 encoded to yield t > >> and then t is base64 decoded to yield n, will it always be the case > >> that m equals n? > > > > False. > > Canonical translation may occur which the different base64 may be the > > same sort of string... > > Base64 is a binary-to-text encoding. Neither encoding nor decoding > should presume any special knowledge of the meaning of the binary data, > or do anything extra based on that presumption. > > Converting Unicode text to and from base64 should not perform any sort > of Unicode normalization, convert between UTFs, insert or remove BOMs, > etc. This is like saying that converting a JPEG image to and from base64 > should not resize or rescale the image, change its color depth, convert > it to another graphic format, etc. > > So I'd say "true" to Roger's question. > > I touched on this a little bit in UTN #14, from the standpoint of trying > to improve compression by normalizing the Unicode text first. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > >