Fwd: Re: Unicode, SMS and year 2012

anbu Fri, 27 Apr 2012 09:21:44 -0700

Further correction

I was assuming the design, given by the following EBNF, would help:


1(0|1){1(0|1)}(0|1)(0|1)(0|1)(0|1){0(0|1)}0(0|1)1(0|1)

The number of codes supported with a given number of bits (greater than
eight bits), n, is given by:
2 ^ (n ÷ 2)] [n - 4]

-------- Original Message --------
Subject: Fwd: Re: Unicode, SMS and year 2012
Date: Fri, 27 Apr 2012 08:14:13 -0400
From: <a...@peoplestring.com>
To: <or...@secarica.ro>, <unicode@unicode.org>

Please note the following corrections to the mail below:

The number of codes supported with a given number of bits, n, is given by:
[2 ^ (n ÷ 2)] [n - 4]

The total number of codes supported with a given number of bits, n, and
all the number of bits less than it is given by:
3 [2 ^ (n ÷ 2)] [n - 4] - 64

-------- Original Message --------
Subject: Re: Unicode, SMS and year 2012
Date: Fri, 27 Apr 2012 07:54:41 -0400
From: <a...@peoplestring.com>
To: <or...@secarica.ro>, <unicode@unicode.org>

Hi!

I also had the same questions.
In addition I had a few more questions, of which the one below is the most
significant:

What if one had to send a text in multiple scripts, like in the case of a
text and its translation in the same message?

I thought maybe a new transition format or a new character encoding would
be suitable.
I am currently working on a new form of representation. This is how it
goes:

All the characters of the block "C0 Controls and Basic Latin" are
included, with their design unaltered, that is they are encoded in eight
bits (including the initial zero), given by 0xxxxxxx.

All the other codes would surely be designed greater than the eight bits.
I was assuming the design, given by the following EBNF, would help:

1(0|1){1(0|1)}(0|1)(0|1){0(0|1)}0(0|1)1(0|1)

Please note that this design produces codes whose number of bits are even
numbers greater than eight. That, is 10, 12, 14, 16, 18, 20, 22, ... and
so
on.

The number of codes supported with a given number of bits, n, is given by:
2 ^ (n ÷ 2)] [n - 4]

The total number of codes supported with a given number of bits, n, and
all the number of bits less than it is given by:
3 [2 ^ (n ÷ 2) - 1] [n - 4] + 74

Please note that the sign '^' represents raised to the power of, just as
in most computer applications.
Further, note that this design is still under development so may be
subject to minor corrections.

I chose to design codes whose number of bits are even numbers only, rather
than all integers, so that in the event of a corruption of a byte, lets
say
maybe due to network failure, somewhere between other bytes that conform
to
this standard, only the part where there is the corrupt byte and a few
consecutive bytes would be affected, making the effect of the byte loss to
be minimal.

All the information given above in this mail are my intellectual property
and my concern is to be sought before using them for any purpose.

Regards,

Anbu Kaveeswarar Selvaraju

On Fri, 27 Apr 2012 11:06:23 +0300, Cristian Secară
<or...@secarica.ro> wrote:
> Few years ago there was a discussion here about Unicode and SMS
> (Subject: Unicode, SMS, PDA/cellphones). Then and now the situation is
> the same, i.e. a SMS text message that uses characters from the GSM
> character set can include 160 characters per message (stream of 7 bit ×
> 160), whereas a message that uses everything else can include only 70
> characters per message (stream of UCS2 16 bit × 70).
> 
> Although my language (Romanian) was and is affected by this
> discrepancy, then I was skeptical about the possibility to improve
> something in the area, mostly because at that time both the PC and
> mobile market suffered about other critical language problems for me
> (like missing gliphs in fonts, or improper keyboard implementation).
> 
> Things evolved and now the perspectives are much better. Regarding the
> SMS, at that time Richard Wordingham pointed that the SCSU might be a
> proper solution for the SMS encoding [when it comes to non-GSM
> characters].
> 
> Recently I studied as much aspects as I could about the SMS
> standardization, in a step that I started approx a year ago regarding
> the SMS language discrimination just because of the difference in
> message length and cost over a same sentence written with diacritical
> marks (written correctly for that language) or without diacritical
> marks (written incorrectly for that language). Or, for the same reason,
> language discrimination between (say) a French message and (say) a
> Romanian message, both written correctly.
> 
> It turned out that they (ETSI & its groups) created a way to solve the
> 70 characters limitation, namely “National Language Single Shift” and
> “National Language Locking Shift” mechanism. This is described in 3GPP
> TS 23.038 standard and it was introduced since release 8. In short, it
> is about a character substitution table, per character or per message,
> per-language defined.
> 
> Personally I find this to be a stone-age-like approach, which in my
> opinion does not work at all if I enter the message from my PC keyboard
> via the phone's PC application (because the language cannot always be
> predicted, mainly if I am using dead keys). It is true that the actual
> SMS stream limit is not much generous, but I wonder if the SCSU would
> have been a better approach in terms of i18n. I also don't know if the
> SCSU requires a language to be prior declared, or it simply guess by
> itself the required window for each character.
> 
> Apparently the SCSU seems to be ok for my language, or Hungarian, or
> Bulgarian, etc., but is this ok also for non-Latin and non-Cyrillic
> scripts ? This versus the language shift mechanism, which is still 7
> bit. Release 10 of that standard includes language locking shift tables
> for Turkish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam,
> Oriya, Punjabi, Tamil, Telugu and Urdu.
> 
> Is there someone with more experience on this ?
> 
> Thank you,
> Cristi

Fwd: Re: Unicode, SMS and year 2012

Reply via email to