RE: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

Tex via Unicode Fri, 12 Oct 2018 12:29:12 -0700

I agree with Doug. Base64 maps each byte of the source string to unique bytes 
in the destination string. Decoding is also a unique mapping.

If the encoded string is “translated” in some way by additional processes, 
canonical or otherwise, then all bets are off.

If you disagree, please offer an example or additional details of how 2 base64 
strings might be equivalent.

Tex

From: Unicode [mailto:[email protected]] On Behalf Of J Decker via 
Unicode
Sent: Friday, October 12, 2018 9:29 AM
To: [email protected]
Cc: Unicode Discussion
Subject: Re: Base64 encoding applied to different unicode texts always yields 
different base64 texts ... true or false?

On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode <[email protected]> 
wrote:

J Decker wrote:

>> How about the opposite direction: If m is base64 encoded to yield t
>> and then t is base64 decoded to yield n, will it always be the case
>> that m equals n?
>
> False.
> Canonical translation may occur which the different base64 may be the
> same sort of string...

Base64 is a binary-to-text encoding. Neither encoding nor decoding
should presume any special knowledge of the meaning of the binary data,
or do anything extra based on that presumption.

Converting Unicode text to and from base64 should not perform any sort
of Unicode normalization, convert between UTFs, insert or remove BOMs,
etc. This is like saying that converting a JPEG image to and from base64
should not resize or rescale the image, change its color depth, convert
it to another graphic format, etc.

So I'd say "true" to Roger's question.

On the first side (X to base64) definitely true.

But there is potential that text resulting from some decoded buffer is 
translated, resulting in a 'congruent' string that's not exactly the same... 
and the base64 will be different.

Comparing some base64 string with some other base64 string shows a binary 
difference, but may be still the 'same' string. 

I touched on this a little bit in UTN #14, from the standpoint of trying
to improve compression by normalizing the Unicode text first.

--
Doug Ewell | Thornton, CO, US | ewellic.org

RE: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

Reply via email to