On Tue, Jul 21, 2015 at 12:46 David Starner [mailto:[email protected]] wrote:

On Tue, Jul 21, 2015 at 2:14 AM Dreiheller, Albrecht 
<[email protected]> wrote:
If the author really intends to deceive potential readers he will succeed.
Possibly. Code is hard. But the Ogham space is not a real threat; it's easy to 
search for and obviously a deliberate attempt to confuse.

My concern is not about the Ogham space, but about the free usage of non-Ascii 
in programming languages in general.
Just imagine, when you decide to open a door for public traffic in busy city 
with a security check point, you wouldn't  consider only
how to check a single person; instead, you have to consider how you would check 
thousands of people within one hour, if you don’t plan to
close the door again.
Therefore, consider a huge software system written developed in, let's say, 
Serbia or Russia using Cyrillic names throughout for classes and variables.
int ци́фра = чита́ть(пе́речень);  return ци́фра;
It might be  a valuable system with some unique features and you want to 
evaluate the source code before you buy it.
Or the community want's to adopt it for Open Source because it has some nice 
features.
Looking for a deliberate attempt to confuse within this code  would be like 
looking for a needle in a haystack, since every line has non-Ascii in it.
 Programming languages like JS should at least implement exclusion rules from 
the "Unicode Confusables Characters" list.
Have you looked at that list? 1 and l is one pair of confusables in that list, 
and while that is an incredibly classic confusable pair,
it's not one that's implementable in a programming language. а and a is another 
pair; but if you ban а, you've practically banned Cyrillic identifiers 
completely.
Of course, there are confusables within the Ascii range, but they are 
well-known for years, and thus more likely to be detected.
Regarding your other example, some compilers warn if you have an assignment 
within an if-clause.
I used a term "exclusion rules", meaning a ruleset bases on the confusables 
list.
For example  the following code sequence
           int a;  {  int а;  a = 5;  }      (N.B. the second "а"  is Cyrillic)
could be banned by a rule saying
"It's not allowed to declare a variable that is DISTINCT from others (thus not 
hiding them) but which is CONFUSABLY SIMILAR  to another variable in the same 
scope."
Another rule could demand "It's not allowed to mix two alphabets within one 
name".
This would not ban Cyrillic identifiers in general.
Otherwise such programming languages ought to be black-listed.
Black-listed? By whom? If you wish to make sure a set of code you control does 
not use non-ASCII characters, most source-control systems.will let you reject 
such files from being checked in. If you want to reject JavaScript altogether, 
that is also your freedom. But of all the attacks weighed against JavaScript, I 
seriously doubt that this is the one that will bring it down.
With "black-listed" I meant "known to be unsafe" in some way.
Just the same way as domain-registration authorities  would be  "known to be 
unsafe"   if they  accept or allow domain names
like    mybаnk.com   beside   mybank.com  where one has a Latin "a" and the 
other has a Cyrillic  "а"  in it,  thus ignoring the confusables list.
BTW,  I don't want to attack JavaScript.  It's pretty.

The fathers of ALGOL  and other early languages racked their brain to avoid 
ambigous semantics caused by poor syntax rules.
Today when Unicode supersedes Ascii in some contexts the challenges are 
different, but not less important.

Albrecht.

Reply via email to