Re: A research idea for entering characters

2013-04-06 Thread Jon Hanna
On 04/06/2013 09:36 AM, William_J_G Overington wrote: > Text is for reading by humans. > > QR codes are for reading by computers. > > I wondered if it would be possible to have images that could be read by both > humans and computers. Sure. Just set the error-correction high, and write over the

Re: (Informational only: UTF-8 BOM and the real life)

2012-07-30 Thread Jon Hanna
On 07/30/2012 02:12 PM, Doug Ewell wrote: > Please, no more conspiracy theories. Yes. If this goes on, I'll find it impossible to refrain from telling you all my theories about the ANSI-INCITS 154-1988 (R1999) keyboard. And nobody wants that.

Re: name change

2011-11-29 Thread Jon Hanna
On 2011-11-23 10:38, Jeremie Hornus wrote: I was thinking the ID being the code point value itself, and the "name" a human readable description of it. They are both IDs. One is from the range of numbers from 0 to 1114111 (10 base 16), the other is from the range of strings of characters e

Re: charset parameter in Google Groups

2010-06-30 Thread Jon Hanna
António MARTINS-Tuválkin wrote: If the EU can tell Britain that it can't sell eggs by the dozen any more, Yesterday I bought a dozen eggs (2 racks of 6, set 2×3) here in Portugal. This must be an incredibly new regulation. The Daily Mail isn't as easily available in Portugal. It's one of s

RE: outside decomposed, inside precomposed

2004-10-13 Thread Jon Hanna
sing to NFD would be quite unusual. > > BTW, this application supports import of UTF-8, but will not export > UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal > storage form). Odd indeed. Regards, Jon Hanna <http://www.selkieweb.com/>

RE: bit notation in ISO-8859-x is wrong

2004-10-12 Thread Jon Hanna
> But for certain purposes e.g. historical astronomical > calculations (used > for establishing chronology from records of eclipses etc) the year > numbers used are effectively negative numbers (and zero) AD. The proleptic Gregorian calendar is more often used with the terms CE or EV - whether

RE: Saudi-Arabian Copyright sign

2004-09-19 Thread Jon Hanna
> > > For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif > > > > Looks like {U+062D, U+20DD} > > Yes, it does look like that. But it forms a separate entity, > just like its precedents COPYRIGHT SIGN or SOUND RECORDING > COPYRIGHT SIGN or REGISTERED. All of which were in existing sta

RE: Saudi-Arabian Copyright sign

2004-09-19 Thread Jon Hanna
> For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif Looks like {U+062D, U+20DD}

RE: Combining across markup? (Was: RE: sign for anti-neutrino - g ree k nu with diacritical line aboveworkaround ?)

2004-08-10 Thread Jon Hanna
s of XML replacing ≯ with U+226F would mean the document was no longer well-formed. So even without an explicit spec saying otherwise the above would be problematic. -- Jon Hanna <http://www.hackcraft.net/> …it has been truly said that hackers have even more words for equipment fail

Re: Combining across markup? (Was: RE: sign for anti-neutrino - g ree k nu with diacritical line aboveworkaround ?)

2004-08-10 Thread Jon Hanna
Quoting Philipp Reichmuth <[EMAIL PROTECTED]>: > Jon Hanna schrieb: > > The W3C Character Model does not, or will not since it's not yet a > > Recommendation, allow text nodes or attribute values to begin with > defective > > combining character sequences. >

Re: Combining across markup? (Was: RE: sign for anti-neutrino - g ree k nu with diacritical line aboveworkaround ?)

2004-08-10 Thread Jon Hanna
The W3C Character Model does not, or will not since it's not yet a Recommendation, allow text nodes or attribute values to begin with defective combining character sequences. -- Jon Hanna <http://www.hackcraft.net/> "What's a false move? Is it very different from a real one?"

RE: Looking for transcription or transliteration standards latin- >arabic

2004-07-09 Thread Jon Hanna
t's going and the other where it's been. -- Jon Hanna <http://www.hackcraft.net/> "One of the few good things about modern times: If you die horribly on television, you will not have died in vain. You will have entertained us." - Kurt Vonnegut.

Re: Looking for transcription or transliteration standards latin- >arabic

2004-07-09 Thread Jon Hanna
y changed the spelling of their name. It's not even pronounced the same. They have a famous typewriter keyboard inventor in their line, but no famous composers. -- Jon Hanna <http://www.hackcraft.net/> "Write a wise saying and your name will live forever" - Anonymous

Re: alphabetic sorting of IPA and other derived letters

2004-07-08 Thread Jon Hanna
ment is in, but this can be changed through the dialog opened by the "Options..." button on the sort dialog. -- Jon Hanna <http://www.hackcraft.net/> "It is the most shattering experience of a young man's life when he awakes and quite reasonably says to himself, 'I will never play The Dane.'"

Re: Latin long vowels

2004-06-22 Thread Jon Hanna
BELOW AND MACRON U+1E7A LATIN CAPITAL LETTER U WITH MACRON AND DIAERESIS U+1E7B LATIN SMALL LETTER U WITH MACRON AND DIAERESIS > If so, would anyone know from where a Windows XP font containing these five > characters could be download? Arial Unicode has at least some of them. -- Jon Hanna <

Re: Proposal to encode dominoes and other game symbols

2004-06-02 Thread Jon Hanna
he next 5 years, as Fortune's demonstration that the Gods laugh at all plans. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

Re: Proposal to encode dominoes and other game symbols

2004-06-02 Thread Jon Hanna
re as a bullet than as an inline symbol, and hence no more justified than OWL and TURKEY would be because of their similar use in O'Reilly Associates publications. But sure, go and look for examples (not in driver's testing materials - the point there is to represent what one

RE: Proposal to encode dominoes and other game symbols

2004-05-25 Thread Jon Hanna
se a deck with the Solitaire encryption algorithm. <http://www.schneier.com/solitaire.html> -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

Re: Proposal to encode dominoes and other game symbols

2004-05-25 Thread Jon Hanna
ar game in > > high school and college in my days. > > "Trumps" in English. I suggest that 21 trumps be encoded, but not > named, because the correspondence of names to numbers is variable. Are they very variable? I can only think of the one substitution suggested by Cro

Re: Zip vs. Non Zipped and ISO 15924 draft fixes

2004-05-21 Thread Jon Hanna
Quoting "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>: > Jon Hanna scripsit: > > > [T]he default encoding on the server (which really should be utf-8 > > on www.unicode.org at this stage). > > Currently it is, but there are sticky issues: in particular, a defau

Re: Zip vs. Non Zipped and ISO 15924 draft fixes

2004-05-21 Thread Jon Hanna
Quoting Michael Everson <[EMAIL PROTECTED]>: > At 15:39 +0100 2004-05-21, Jon Hanna wrote: > > >Were the headers correct? > > It is plain text. HTTP has headers separate to the content (the headers come first and the content comes next). These headers can contain encodi

Zip vs. Non Zipped and ISO 15924 draft fixes

2004-05-21 Thread Jon Hanna
dea for drafts that are being edited, but it might be more appropriate once they are finalised. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))

2004-05-18 Thread Jon Hanna
would hence read as taking something from the top of a page or viewing area and moving it, in its entirety, to the bottom, not as something starting at the top and continuing towards the bottom. In summary, TTB, not T2B, please. -- Jon Hanna <http://www.hackcraft.net/> "…it has be

Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))

2004-05-17 Thread Jon Hanna
x27;t composed of a BTT passage, a LTR passage and a TTB passage, but of a single passage which follows a path which changes through those three directions. Paths are not a plain text matter. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more wor

Re: any unicode conversion tools?

2004-05-07 Thread Jon Hanna
r way they want :) it's *just* about possible that the seven-octet sequence FE 80 80 80 80 80 AF would also be treated as U+002F SOLIDUS. [1]Indeed the format of UTF-8 would make it possible to unambiguously encode any value up to 0xFF but this exceeds the ISO 10646 codepoint space a

RE: Just if and where is the then?

2004-05-06 Thread Jon Hanna
e > that has no awareness of a custom encoding to do what they want. If you think of the users of an encoding as a social network then we would expect something like Metcalf's or Reed's law to affect it. The bigger the network the better off they'll be. Unicode ha

Re: Just if and where is the then?

2004-05-05 Thread Jon Hanna
pheme clusters correctly is a perfect subset of the work involved in developping and using a new 8-bit encoding. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

Re: Just if and where is the then?

2004-05-05 Thread Jon Hanna
or a long time, never mind any other use of that encoding. Do you really think the same would be true of ISO 8859-17? -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

Re: Just if and where is the then?

2004-05-05 Thread Jon Hanna
he fact that it was not practical to act as if we were at encoding year-zero - if we had then we probably wouldn't have precomposed characters for European languages, never mind any others) but those problems are considerably less than existed previously and ISO-8859-17+ is always going to be

RE: [OT] Even viruses are now i18n!

2004-04-23 Thread Jon Hanna
given that the goal is to infect as many machines as possible as quickly as possible, anything that gets more than 50% accuracy should be considered a successful approach in that context. If the authorities find the author I doubt the robustness of the content-language heuristic will be top

Re: ZX80 (was: Fixed Width Spaces (was: Printing and Displaying DependentVowels))

2004-04-01 Thread Jon Hanna
, and I'm looking forward to reading it. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

ZX80 (was: Fixed Width Spaces (was: Printing and Displaying DependentVowels))

2004-04-01 Thread Jon Hanna
ke a letter, Gosh, that brings me back. All those characters that were BASIC keywords compressed into one octet. How could we have neglected to encode such important legacy characters, this unnecessarily complicates round-trip conversion between ZX80s and Unicode. -- Jon Hanna <http://www.hackcraf

Re: [OT] C-sharp

2004-03-23 Thread Jon Hanna
a carriage return in it's name :) -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

Re: [OT] C-sharp

2004-03-23 Thread Jon Hanna
rectly when exchanged. If you really wanted to you chould use either the hash of sharp symbol in the extension with Win2K at least (just successfully tested this). File extensions are molehills that are frequently made into mountains. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

Re: Irish dotless I

2004-03-19 Thread Jon Hanna
> Fine. I concede that this is the case. Therefore, let's change the > underlying > form of <0069> to a dotless "i" and let English speakers change it to a > dotted > "i" with the font. I am happy to inform you that "the underl

Re: Irish dotless I (was: Languages with letters that always take diacriticals

2004-03-19 Thread Jon Hanna
s in Turkic languages then you would have a point (and in certain circumstances so would the Irish i). Whether an Irish person writes an i without a dot, an English person writes it with a dot, or a 12 year old girl penning a valentine card writes it with a heart it is still the letter i. -- Jon

Re: help needed with adding new character

2004-03-18 Thread Jon Wilson
n a particular font (and perhaps colour). The specific latin font used to represent the CAPITAL A in the anarchy symbol is unimportant. Jon PS. Croquet challenge accepted - I have a set at home. I believe I get to choose time and location?

(no subject)

2004-03-18 Thread Jon Hanna
Quoting Marion Gunn <[EMAIL PROTECTED]>: how to guarantee continuance, > in the specific context of Irish text computing, of the traditional > restriction of the Irish diacritic dot (having only one single function in > Irish) to the consonants to which it belongs? A spell ch

Re: help needed with adding new character

2004-03-18 Thread Jon Wilson
[EMAIL PROTECTED] wrote: Jon Wilson scripsit: The character in question is a variant of "CIRCLED LATIN CAPITAL LETTER A", commonly referred to as the "Anarchy" symbol. The bars of the A are longer than normal, extending to touch or even overlap the circle. It's basi

help needed with adding new character

2004-03-18 Thread Jon Wilson
s area. In the spirit of anarchy, I am likely to pursue this application, whatever response I get! Equally in the spirit of anarchy, you are free to make provide whatever comments and assistance you wish, on any of the above points. Thanks, Jon

Re: OT? Languages with letters that always take diacriticals

2004-03-16 Thread Jon Hanna
positions? I can only bring the language-independent ones to mind right now. There is a language-independent decomposition of LATIN CAPITAL LETTER I WITH DOT ABOVE to LATIN CAPITAL LETTER I and COMBINING DOT ABOVE. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that ha

Re: OT? Languages with letters that always take diacriticals

2004-03-16 Thread Jon Hanna
gh it does run the risk of being confused with í. However I suspect that a large number are not "non-native", but were in fact created here. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

Re: %

2004-03-15 Thread Jon Hanna
an Or for that matter % since it isn't significant in HTML and can be safely placed straight into the source. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt

RE: websites

2004-02-24 Thread Jon Hanna
elected that isn't normally in the list for UTF-16 (actually it referred to it as "Unicode" and "Unicode (Big Endian)" depending on which of the two pages I viewed. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Re: websites

2004-02-23 Thread Jon Hanna
x27;t include UTF-16. > > Maybe this browser is one of this very small minority which don't support > UTF-8 _and_ UTF-16 ? > Or it might just be that it's relatively hard to mis-identify UTF-16, and hence it doesn't need to be given as a user-override. Have you tested with i

Re: websites

2004-02-23 Thread Jon Hanna
; there are browsers out there that don't support anything except ISO 8859-1 and even a few that get downright confused by anything that isn't ASCII. Who knows, maybe there are even people using them! In any case, browsers that don't support UTF-8 and UTF-16 are now a very small minorit

Re: inconsistent behaviour in windows

2004-02-19 Thread Jon Hanna
e, a global memory handle or some other way of sharing data rather than passing the data directly as a parameter. Neither of these are ideal, if something better occurs to me I'll let you know. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Re: extracting code page of current locale

2004-02-12 Thread Jon Hanna
unix, nl_langinfo(CODESET) returns the code page of the locale set by > setlocale > I'm not sure, but GetLocaleInfo seems to allow you to obtain codepage info if you know the locale id. <http://msdn.microsoft.com/library/en-us/intl/nls_34rz.asp> -- Jon Hanna <http://www.hackcra

Re: Astrological symbols

2004-02-05 Thread Jon Hanna
as well as hypothetical planets and a few other features which individual astrologers have invented symbols for). Though it has made me think that it would be nice to gloss U+206A ASCENDING NODE with "Dragon's Head" and U+206B DESCENDING NODE with "Dragon's

RE: Panther PUA behavior

2004-02-03 Thread Jon Hanna
UA assignments. > I think I may have dealt with bureaucracies using such a system in the past. It's all become clear now. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Re: [OT] UTF-81920 was RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Jon Hanna
Quoting Philippe Verdy <[EMAIL PROTECTED]>: > From: "Jon Hanna" <[EMAIL PROTECTED]> > > Quoting Marco Cimarosti <[EMAIL PROTECTED]>: > > > > > Jon Hanna wrote: > > > > I refuse to rename my UTF-81920! > > > > > >

[OT] UTF-81920 was RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Jon Hanna
Quoting Marco Cimarosti <[EMAIL PROTECTED]>: > Jon Hanna wrote: > > I refuse to rename my UTF-81920! > > Doug, Shlomi, there's a new one out there! > > Jon, would you mind describing it? There are two different UTF-81920s (the resultant ambiguity is very much i

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Jon Hanna
> By the way, I don't think that there's an official reference that attributes > the acronym "UTF-9" to any of these encoding forms. I think that if "UTF-9" > is used it should be agreed by Unicode as being an official unique > representation. I ref

Re: Unicode forms for internal storage

2004-01-21 Thread Jon Hanna
rols forbidden in the 1.0 spec are allowed in the 1.1 spec if they appear as character references - so this no longer holds (unless you store them as references or otherwise escaped, which would bring its own issues). -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Re: Cuneiform Free Variation Selectors

2004-01-20 Thread Jon Hanna
it to be on the safe side. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Re: UTF8 locale & shell encoding

2004-01-16 Thread Jon Hanna
compiler and work identically with another compiler, even from the > same compiler provider. Please show how this is so beyond the names of the locales. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Re: UTF8 locale & shell encoding

2004-01-16 Thread Jon Hanna
> The windows name for "en_US.UTF8" is "English_United States.65001", ".65001" > will be UTF-8 in the default locale. > More on this at the MS documentation for setlocale <http://msdn.microsoft.com/library/en-us/vclib/html/_crt_setlocal

Re: UTF8 locale & shell encoding

2004-01-16 Thread Jon Hanna
;en_US.UTF8" is "English_United States.65001", ".65001" will be UTF-8 in the default locale. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Re: Klingon

2004-01-15 Thread Jon Hanna
is clunky, but not that clunky. No, the use of "ghoti" by Shaw was silly, the reference to it in the Klingon lexicon is funny (now if it was spelt "ghoti" but pronounced "fish" then it would be silly). -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Re: Klingon

2004-01-15 Thread Jon Hanna
locale-sensitive title-case operation for the Irish language would produce "Nathair" from "nAthair" although a deliberately "fuzzy" case-folding operation might. If Klingon isn't in the Latin script the joke about having the word "ghoti" for fish isn't as funny. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

RE: Detecting encoding in Plain text

2004-01-12 Thread jon
mounts of Latin-1 - in particular files for which certain ASCII characters are given an application-specific meaning; for instance XML and HTML files, comma-delimited files, tab-delimited files, vCards and so on. It can be particularly reliable in cases where certain ASCII characters will alway

Re: Detecting encoding in Plain text

2004-01-08 Thread jon
in any other encoding others are more troublesome. If there is no source of encoding information (such as you get with xml declarations, HTTP headers and such), and even if there is, it may be best to offer your users the ability to select encodings (perhaps with the default choice based on loca

Re: Feedback Sought on Article Please

2004-01-07 Thread jon
ors > > of XML parsers and other software that process XML. > Murphy's law kicked in and I noticed a mistake just after asking for feedback, you must have tried the link while I was uploading the correction. It should work now. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Feedback Sought on Article Please

2004-01-07 Thread jon
before some poor soul reads it and gets misled. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

Re: Name Mixup Behind Air France Groundings

2004-01-05 Thread jon
are nicknamed "Jackal", as far as I can make out there isn't a criminal or terrorist organisation in the world that doesn't have a member using that handle). Regards, Jon Hanna (neither the psychedilic counter-culture journalist, the Christian- rock journalist, nor the clas

Re: Aramaic unification and information retrieval

2003-12-24 Thread jon
n the guy who argued that the word "angel" was derived from the astrological use of the word "angle". -- Jon Hanna <http://www.hackcraft.net/>

Re: Unicode->ASCII approximate conversion

2003-12-19 Thread jon
void confusion, of course this wouldn't be possible with an existing normalisation API, though if the number of characters handled specially is small it would be possible to do that in a first pass. -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie>

RE: [OT] CJK -> CJC (Re: Corea?)

2003-12-17 Thread jon
stians mean offence when they refer to Jesus through any of the countless transcriptions, spellings and pronunciations used in various languages. I think this is analogous to assuming that anyone dreaming of packing it all in and buying a villa in Provence similarly means no offence when expr

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread jon
> There's no reason to expect that there will be any 0307 whatever in > Turkish/Azeri texts: it's not a diacritic those languages use, AFAIK. There's no reason to expect that there won't be, particularly if they quote a piece in a language which does

RE: Case mapping of dotless lowercase letters

2003-12-16 Thread jon
don't assume you are where you appear to be. I like to summarise security advice thusly: "if you trust my advice on security you're starting with completely the wrong attitude" :) -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie>

RE: Case mapping of dotless lowercase letters

2003-12-16 Thread jon
re passing UTF-8 around, but is this formalised yet? But yes, {U+0131}{U+0307} can look awfully similar to {U+0069}, I think {U+0069} {U+0307} would as well (and of course there are other opportunities for visual confusion unrelated to the U+0069 and U+0131). -- Jon Hanna

Re: [OT] CJK -> CJC (Re: Corea?)

2003-12-15 Thread jon
o call themselves whatever they want, more troublesome would be if they wish to change their ISO 3166 codes. CR is taken and CP exceptionally reserved, so hopefully they'll remain static. Todays threads are putting me in a mood to re-read Cryptonomicon... -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie>

Re: [OT] CJK -> CJC (Re: Corea?)

2003-12-15 Thread jon
Come to think of it, "Manchuquo" comes before "Nipon" and "Nihon" in just about any way you can think of Latinising it. -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie>

RE: Case mapping of dotless lowercase letters

2003-12-15 Thread jon
s, even though > it's rendered with a dot? Since i is soft-dotted presumably you'd take off a dot, and then put on a dot. Clear as mud! -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie>

[OT reversing letters to avoid offence] Re: [Fwd: Re: Swastika to be banned by Microsoft?]

2003-12-15 Thread jon
#x27;t just double sigels that have the second mirrored, but all double letters. FWIW not only are the sources I learnt this from not reliable on the history of the Futhark, being concerned only with the modern occult use, but they also claimed it was a purely aesthetic matter (and having experimented I agree it's prettier). -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie>

Re: [Fwd: Re: Swastika to be banned by Microsoft?]

2003-12-15 Thread jon
have originally been written (I've heard of entire lines being mirrored, such as on the Franks Casket, but not individual characters) or if it was a post-war innovation to deliberately avoid writing SS. -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for h

Re: [Fwd: Re: Swastika to be banned by Microsoft?]

2003-12-15 Thread jon
> strikes me as a matter of squeamishness more than respect for those who "Squeamishness" isn't quite the right word, and is belittling; I can understand why some people would want the symbol off their computer.

Re: [Fwd: Re: Swastika to be banned by Microsoft?]

2003-12-15 Thread jon
Quoting "Mark E. Shoulson" <[EMAIL PROTECTED]>: > However, now that you mention it, it is true that the stylized S used in > the abbreviation for the SS was actually required in all fonts by the > Nazi government, so by that reasoning it, at least, has some standing > for being encoded (though I c

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread jon
> But why on earth are we talking about mapping grapheme clusters to the PUA ?! It's valid, just don't expect, and hence don't plan for, anyone else following suit.

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread jon
grapheme cluster boundaries. > > This implies that end users should not require counts of code units or > code points. I don't think anyone argued against this being what *end* users require. Certainly for small values of "end" anyway. -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie>

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread jon
some way defective" is actually a good way to put it methinks, they aren't illegal, and in some cases you can do things with them that are both reasonable and useful, but in other situations they may be problematic. -- Jon Hanna | Toys and books <http://www

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread jon
lt;=> e + ´ when compared to many higher-level string handling activities (regular expressions, bidirectional over-riding, and the subtler points of case operations). Even so, I think it's making those two levels meet that is the biggest stumbling block for beginners. -- Jon Hanna

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-10 Thread jon
gt; character > references as in: > const wchar_t c = L'\U000309'; \u must be followed by four hexadecimal digits, \U by eight. The biggest advantage of L'\u0309' over direct use of the combining character is you can read the thing (source is intended for human read

Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)

2003-12-10 Thread jon
"default grapheme clusters" in Unicode. Functions which count either of these are perfectly conformant with Unicode, as long as the perform their task correctly. -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie/>

Re: [OT]

2003-12-09 Thread jon
x27;ve already gone past the stage where you can't taste it (I understand heavily refrigerated beer is an American invention, and given the way American beer tastes this makes sense), soon it'll be served to you on a stick. I can't even remember if this thread was ever on topi

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jon
> > You might as well say that C code is not plain text because it too is > > subject to special canons of interpretation. > > C, C++ and Java source files are not plain text as well (they have their own C, C++ and Java source files are plain text. > "text/*" MIME type, which is NOT "text/plain"

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jon
, of these are examples of conformant behaviour. -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie/>

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jon
lesheet. So it is not a problem that > there is a defective combining sequence, nor that the accent is not > combined with the e as it would be in NFC. Is that correct? You can, whether you should is another thing, and whether it would render correctly yet another. -- Jon Hann

Re: [OT]

2003-12-09 Thread jon
ined in China; he professed a belief that Guinness was why the Irish had thicker bones than the Chinese in his experience. There are considerably more doctors who would say that if you were going to drink a beer it should be stout, without going so far as to actually recommend it in and of itself.

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jon
can be performed through other means. <http://www.w3.org/TR/charmod/benoit.svg> is an SVG example. This seems a superior method for at least some of the use-cases cited anyway (I've missed some of this thread though). -- Jon Hanna | Toys and books &

Re: OT (was RE: MS Windows and Unicode 4.0 ?)

2003-12-04 Thread jon
> SIL's involvement in Bible translation is not always widely advertised > for various reasons: it is not the only work that SIL is involved in, > not all SIL projects involve Bible translation, and in some countries in > which SIL works the national agencies (government ministry, > university,...)

RE: UTF-16 inside UTF-8

2003-12-03 Thread jon
more straight-forward implementation that stored wchar_t characters). -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for sick children: | <http://santa.boards.ie/>

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread jon
ive property in section 4.2 of the standard. -- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for sick children: | <http://santa.boards.ie/>

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread jon
> Shouldn't it permit "assa" and "aßa" to co-exist? It isn't like ß is > canonically equivalent to ss (if I read the file aright, it isn't even > compatibility equivalent). It is a case-insensitive system. If it is a case-insensitive system then one should be able to safely treat Uppercase(x)

RE: Compression through normalization

2003-12-01 Thread jon
Quoting Philippe Verdy <[EMAIL PROTECTED]>: > [EMAIL PROTECTED] wrote: > > Further, a Unicode-aware algorithm would expect a choseong character to > > be followed by a jungseong and a jongseong to follow a jungsong, and > > could essentially perform the same benefits to compression that > > nor

Re: Compression through normalization

2003-12-01 Thread jon
ion system will not be applicable to all uses. Of course that answers the question "should we normalise?" with the question "should we have a compression scheme that isn't universally applicable?" -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

RE: Compression through normalization

2003-11-26 Thread jon
> The whole point of such a tool would be to send binary data on a transport > that > only allowed Unicode text. In practice, you'd also have to remap C0 and C1 > characters; but even then 0x00-0x1F -> U+0250-026F and 0x80-0x9F to > U+0270-U+028F > wouldn't be too complex. Unless you've added a Uni

RE: Definitions

2003-11-26 Thread jon
> In all I would rather ban all defective sequences, by enforcing the W3C > character model. rect: by enforcing the use of full normalisation as defined in the W3C character model.

RE: Definitions

2003-11-26 Thread jon
Quoting Philippe Verdy <[EMAIL PROTECTED]>: > Peter Kirk [mailto:[EMAIL PROTECTED] writes: > > Why is this a problem? Quotes and ">" with combining marks are > > presumably not legal HTML or XML; > > You're wrong: it is legal in both HTML and XML. What is not specified > correctly is the behavio

RE: Compression through normalization

2003-11-26 Thread jon
> In the case of GIF versus JPG, which are usually regarded as "lossless" > versus "lossy", please note that there /is/ no "orignal", in the sense > of a stream of bytes. Why not? Because an image is not a stream of > bytes. Period. What is being compressed here is a rectangular array of > pixe

  1   2   3   >