On 04/06/2013 09:36 AM, William_J_G Overington wrote:
> Text is for reading by humans.
>
> QR codes are for reading by computers.
>
> I wondered if it would be possible to have images that could be read by both
> humans and computers.
Sure. Just set the error-correction high, and write over the
On 07/30/2012 02:12 PM, Doug Ewell wrote:
> Please, no more conspiracy theories.
Yes. If this goes on, I'll find it impossible to refrain from telling
you all my theories about the ANSI-INCITS 154-1988 (R1999) keyboard.
And nobody wants that.
On 2011-11-23 10:38, Jeremie Hornus wrote:
I was thinking the ID being the code point value itself, and the "name"
a human readable description of it.
They are both IDs. One is from the range of numbers from 0 to 1114111
(10 base 16), the other is from the range of strings of characters
e
António MARTINS-Tuválkin wrote:
If the EU can tell Britain that it can't sell eggs by the dozen any
more,
Yesterday I bought a dozen eggs (2 racks of 6, set 2×3) here in Portugal.
This must be an incredibly new regulation.
The Daily Mail isn't as easily available in Portugal. It's one of
s
sing to NFD would be quite
unusual.
>
> BTW, this application supports import of UTF-8, but will not export
> UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal
> storage form).
Odd indeed.
Regards,
Jon Hanna
<http://www.selkieweb.com/>
> But for certain purposes e.g. historical astronomical
> calculations (used
> for establishing chronology from records of eclipses etc) the year
> numbers used are effectively negative numbers (and zero) AD.
The proleptic Gregorian calendar is more often used with the terms CE or EV
- whether
> > > For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif
> >
> > Looks like {U+062D, U+20DD}
>
> Yes, it does look like that. But it forms a separate entity,
> just like its precedents COPYRIGHT SIGN or SOUND RECORDING
> COPYRIGHT SIGN or REGISTERED.
All of which were in existing sta
> For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif
Looks like {U+062D, U+20DD}
s of XML replacing ≯ with U+226F would mean the document was
no longer well-formed.
So even without an explicit spec saying otherwise the above would be
problematic.
--
Jon Hanna
<http://www.hackcraft.net/>
it has been truly said that hackers have even more words for
equipment fail
Quoting Philipp Reichmuth <[EMAIL PROTECTED]>:
> Jon Hanna schrieb:
> > The W3C Character Model does not, or will not since it's not yet a
> > Recommendation, allow text nodes or attribute values to begin with
> defective
> > combining character sequences.
>
The W3C Character Model does not, or will not since it's not yet a
Recommendation, allow text nodes or attribute values to begin with defective
combining character sequences.
--
Jon Hanna
<http://www.hackcraft.net/>
"What's a false move? Is it very different from a real one?"
t's going and the other where it's been.
--
Jon Hanna
<http://www.hackcraft.net/>
"One of the few good things about modern times:
If you die horribly on television, you will not have died in vain. You will
have entertained us." - Kurt Vonnegut.
y changed
the spelling of their name. It's not even pronounced the same. They have a
famous typewriter keyboard inventor in their line, but no famous composers.
--
Jon Hanna
<http://www.hackcraft.net/>
"Write a wise saying and your name will live forever" - Anonymous
ment
is in, but this can be changed through the dialog opened by the "Options..."
button on the sort dialog.
--
Jon Hanna
<http://www.hackcraft.net/>
"It is the most shattering experience of a young man's life when he awakes
and quite reasonably says to himself, 'I will never play The Dane.'"
BELOW AND MACRON
U+1E7A LATIN CAPITAL LETTER U WITH MACRON AND DIAERESIS
U+1E7B LATIN SMALL LETTER U WITH MACRON AND DIAERESIS
> If so, would anyone know from where a Windows XP font containing these five
> characters could be download?
Arial Unicode has at least some of them.
--
Jon Hanna
<
he next 5 years, as Fortune's demonstration that the
Gods laugh at all plans.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
re as a bullet than as an inline
symbol, and hence no more justified than OWL and TURKEY would be because of
their similar use in O'Reilly Associates publications. But sure, go and look
for examples (not in driver's testing materials - the point there is to
represent what one
se a deck with the Solitaire
encryption algorithm. <http://www.schneier.com/solitaire.html>
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
ar game in
> > high school and college in my days.
>
> "Trumps" in English. I suggest that 21 trumps be encoded, but not
> named, because the correspondence of names to numbers is variable.
Are they very variable? I can only think of the one substitution suggested by
Cro
Quoting "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>:
> Jon Hanna scripsit:
>
> > [T]he default encoding on the server (which really should be utf-8
> > on www.unicode.org at this stage).
>
> Currently it is, but there are sticky issues: in particular, a defau
Quoting Michael Everson <[EMAIL PROTECTED]>:
> At 15:39 +0100 2004-05-21, Jon Hanna wrote:
>
> >Were the headers correct?
>
> It is plain text.
HTTP has headers separate to the content (the headers come first and the content
comes next). These headers can contain encodi
dea for drafts that are being
edited, but it might be more appropriate once they are finalised.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
would hence read as taking
something from the top of a page or viewing area and moving it, in its
entirety, to the bottom, not as something starting at the top and continuing
towards the bottom.
In summary, TTB, not T2B, please.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has be
x27;t composed of a BTT
passage, a LTR passage and a TTB passage, but of a single passage which follows
a path which changes through those three directions.
Paths are not a plain text matter.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more wor
r way they want :) it's *just* about possible that the seven-octet
sequence FE 80 80 80 80 80 AF would also be treated as U+002F SOLIDUS.
[1]Indeed the format of UTF-8 would make it possible to unambiguously encode
any
value up to 0xFF but this exceeds the ISO 10646 codepoint space a
e
> that has no awareness of a custom encoding to do what they want.
If you think of the users of an encoding as a social network then we would
expect something like Metcalf's or Reed's law to affect it. The bigger the
network the better off they'll be. Unicode ha
pheme clusters
correctly is a perfect subset of the work involved in developping and using a
new 8-bit encoding.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
or a long time, never mind any other use of that encoding.
Do you really think the same would be true of ISO 8859-17?
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
he fact that it was not practical to act as
if we were at encoding year-zero - if we had then we probably wouldn't have
precomposed characters for European languages, never mind any others) but those
problems are considerably less than existed previously and ISO-8859-17+ is
always going to be
given that the goal is to infect as many machines as
possible as quickly as possible, anything that gets more than 50% accuracy
should be considered a successful approach in that context.
If the authorities find the author I doubt the robustness of the
content-language heuristic will be top
, and I'm looking
forward to reading it.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
ke a letter,
Gosh, that brings me back. All those characters that were BASIC keywords
compressed into one octet. How could we have neglected to encode such important
legacy characters, this unnecessarily complicates round-trip conversion between
ZX80s and Unicode.
--
Jon Hanna
<http://www.hackcraf
a carriage return in it's name :)
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
rectly when exchanged. If you really wanted to you chould use either the
hash of sharp symbol in the extension with Win2K at least (just successfully
tested this).
File extensions are molehills that are frequently made into mountains.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
> Fine. I concede that this is the case. Therefore, let's change the
> underlying
> form of <0069> to a dotless "i" and let English speakers change it to a
> dotted
> "i" with the font.
I am happy to inform you that "the underl
s in Turkic languages then you would
have a point (and in certain circumstances so would the Irish i).
Whether an Irish person writes an i without a dot, an English person writes it
with a dot, or a 12 year old girl penning a valentine card writes it with a
heart it is still the letter i.
--
Jon
n a particular font (and perhaps colour). The specific latin font used
to represent the CAPITAL A in the anarchy symbol is unimportant.
Jon
PS. Croquet challenge accepted - I have a set at home. I believe I get
to choose time and location?
Quoting Marion Gunn <[EMAIL PROTECTED]>:
how to guarantee continuance,
> in the specific context of Irish text computing, of the traditional
> restriction of the Irish diacritic dot (having only one single function in
> Irish) to the consonants to which it belongs?
A spell ch
[EMAIL PROTECTED] wrote:
Jon Wilson scripsit:
The character in question is a variant of "CIRCLED LATIN CAPITAL LETTER
A", commonly referred to as the "Anarchy" symbol. The bars of the A are
longer than normal, extending to touch or even overlap the circle.
It's basi
s area.
In the spirit of anarchy, I am likely to pursue this application,
whatever response I get! Equally in the spirit of anarchy, you are free
to make provide whatever comments and assistance you wish, on any of the
above points.
Thanks,
Jon
positions? I can only bring the
language-independent ones to mind right now.
There is a language-independent decomposition of LATIN CAPITAL LETTER I WITH DOT
ABOVE to LATIN CAPITAL LETTER I and COMBINING DOT ABOVE.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that ha
gh it does run the risk of being confused with í. However I
suspect that a large number are not "non-native", but were in fact created
here.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
an
Or for that matter % since it isn't significant in HTML and can be safely placed
straight into the source.
--
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt
elected that isn't normally in the list for UTF-16 (actually it
referred to it as "Unicode" and "Unicode (Big Endian)" depending on which of
the two pages I viewed.
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
x27;t include UTF-16.
>
> Maybe this browser is one of this very small minority which don't support
> UTF-8 _and_ UTF-16 ?
>
Or it might just be that it's relatively hard to mis-identify UTF-16, and hence
it doesn't need to be given as a user-override.
Have you tested with i
; there are browsers out there that don't support
anything except ISO 8859-1 and even a few that get downright confused by
anything that isn't ASCII. Who knows, maybe there are even people using them!
In any case, browsers that don't support UTF-8 and UTF-16 are now a very small
minorit
e, a global memory handle or some other way of
sharing data rather than passing the data directly as a parameter.
Neither of these are ideal, if something better occurs to me I'll let you know.
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
unix, nl_langinfo(CODESET) returns the code page of the locale set by
> setlocale
>
I'm not sure, but GetLocaleInfo seems to allow you to obtain codepage info if
you know the locale id.
<http://msdn.microsoft.com/library/en-us/intl/nls_34rz.asp>
--
Jon Hanna
<http://www.hackcra
as well as hypothetical
planets and a few other features which individual astrologers have invented
symbols for). Though it has made me think that it would be nice to gloss U+206A
ASCENDING NODE with "Dragon's Head" and U+206B DESCENDING NODE with "Dragon's
UA assignments.
>
I think I may have dealt with bureaucracies using such a system in the past.
It's all become clear now.
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
Quoting Philippe Verdy <[EMAIL PROTECTED]>:
> From: "Jon Hanna" <[EMAIL PROTECTED]>
> > Quoting Marco Cimarosti <[EMAIL PROTECTED]>:
> >
> > > Jon Hanna wrote:
> > > > I refuse to rename my UTF-81920!
> > >
> > >
Quoting Marco Cimarosti <[EMAIL PROTECTED]>:
> Jon Hanna wrote:
> > I refuse to rename my UTF-81920!
>
> Doug, Shlomi, there's a new one out there!
>
> Jon, would you mind describing it?
There are two different UTF-81920s (the resultant ambiguity is very much i
> By the way, I don't think that there's an official reference that attributes
> the acronym "UTF-9" to any of these encoding forms. I think that if "UTF-9"
> is used it should be agreed by Unicode as being an official unique
> representation.
I ref
rols forbidden in the 1.0
spec are allowed in the 1.1 spec if they appear as character references - so
this no longer holds (unless you store them as references or otherwise escaped,
which would bring its own issues).
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
it to be on the
safe side.
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
compiler and work identically with another compiler, even from the
> same compiler provider.
Please show how this is so beyond the names of the locales.
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
> The windows name for "en_US.UTF8" is "English_United States.65001", ".65001"
> will be UTF-8 in the default locale.
>
More on this at the MS documentation for setlocale
<http://msdn.microsoft.com/library/en-us/vclib/html/_crt_setlocal
;en_US.UTF8" is "English_United States.65001", ".65001"
will be UTF-8 in the default locale.
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
is clunky, but not that clunky.
No, the use of "ghoti" by Shaw was silly, the reference to it in the Klingon
lexicon is funny (now if it was spelt "ghoti" but pronounced "fish" then it
would be silly).
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
locale-sensitive title-case operation for the Irish language would produce
"Nathair" from "nAthair" although a deliberately "fuzzy" case-folding operation
might.
If Klingon isn't in the Latin script the joke about having the word "ghoti" for
fish isn't as funny.
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
mounts of Latin-1 - in particular files for which certain ASCII
characters are given an application-specific meaning; for instance XML and HTML
files, comma-delimited files, tab-delimited files, vCards and so on. It can be
particularly reliable in cases where certain ASCII characters will alway
in any other encoding others
are more troublesome.
If there is no source of encoding information (such as you get with xml
declarations, HTTP headers and such), and even if there is, it may be best to
offer your users the ability to select encodings (perhaps with the default
choice based on loca
ors
> > of XML parsers and other software that process XML.
>
Murphy's law kicked in and I noticed a mistake just after asking for feedback,
you must have tried the link while I was uploading the correction. It should
work now.
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
before some poor soul reads it and gets misled.
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
are nicknamed "Jackal", as far as
I can make out there isn't a criminal or terrorist organisation in the world
that doesn't have a member using that handle).
Regards,
Jon Hanna (neither the psychedilic counter-culture journalist, the Christian-
rock journalist, nor the clas
n the guy who argued that the word "angel" was derived
from the astrological use of the word "angle".
--
Jon Hanna
<http://www.hackcraft.net/>
void confusion, of
course this wouldn't be possible with an existing normalisation API, though if
the number of characters handled specially is small it would be possible to do
that in a first pass.
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for hospitals:
| <http://santa.boards.ie>
stians mean offence when they refer to Jesus through any of the
countless transcriptions, spellings and pronunciations used in various
languages. I think this is analogous to assuming that anyone dreaming of
packing it all in and buying a villa in Provence similarly means no offence
when expr
> There's no reason to expect that there will be any 0307 whatever in
> Turkish/Azeri texts: it's not a diacritic those languages use, AFAIK.
There's no reason to expect that there won't be, particularly if they quote a
piece in a language which does
don't assume you are where you appear to be.
I like to summarise security advice thusly: "if you trust my advice on security
you're starting with completely the wrong attitude" :)
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for hospitals:
| <http://santa.boards.ie>
re passing UTF-8
around, but is this formalised yet?
But yes, {U+0131}{U+0307} can look awfully similar to {U+0069}, I think {U+0069}
{U+0307} would as well (and of course there are other opportunities for visual
confusion unrelated to the U+0069 and U+0131).
--
Jon Hanna
o call
themselves whatever they want, more troublesome would be if they wish to change
their ISO 3166 codes. CR is taken and CP exceptionally reserved, so hopefully
they'll remain static.
Todays threads are putting me in a mood to re-read Cryptonomicon...
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for hospitals:
| <http://santa.boards.ie>
Come to think of it, "Manchuquo" comes before "Nipon" and "Nihon" in just about
any way you can think of Latinising it.
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for hospitals:
| <http://santa.boards.ie>
s, even though
> it's rendered with a dot?
Since i is soft-dotted presumably you'd take off a dot, and then put on a dot.
Clear as mud!
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for hospitals:
| <http://santa.boards.ie>
#x27;t just double sigels that have the second
mirrored, but all double letters. FWIW not only are the sources I learnt this
from not reliable on the history of the Futhark, being concerned only with the
modern occult use, but they also claimed it was a purely aesthetic matter (and
having experimented I agree it's prettier).
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for hospitals:
| <http://santa.boards.ie>
have originally been written (I've heard of entire lines being mirrored, such
as on the Franks Casket, but not individual characters) or if it was a post-war
innovation to deliberately avoid writing SS.
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for h
> strikes me as a matter of squeamishness more than respect for those who
"Squeamishness" isn't quite the right word, and is belittling; I can understand
why some people would want the symbol off their computer.
Quoting "Mark E. Shoulson" <[EMAIL PROTECTED]>:
> However, now that you mention it, it is true that the stylized S used in
> the abbreviation for the SS was actually required in all fonts by the
> Nazi government, so by that reasoning it, at least, has some standing
> for being encoded (though I c
> But why on earth are we talking about mapping grapheme clusters to the PUA ?!
It's valid, just don't expect, and hence don't plan for, anyone else following
suit.
grapheme cluster boundaries.
>
> This implies that end users should not require counts of code units or
> code points.
I don't think anyone argued against this being what *end* users require.
Certainly for small values of "end" anyway.
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for hospitals:
| <http://santa.boards.ie>
some way defective" is actually a good way to put it methinks, they aren't
illegal, and in some cases you can do things with them that are both reasonable
and useful, but in other situations they may be problematic.
--
Jon Hanna | Toys and books
<http://www
lt;=> e + ´ when
compared to many higher-level string handling activities (regular expressions,
bidirectional over-riding, and the subtler points of case operations).
Even so, I think it's making those two levels meet that is the biggest
stumbling block for beginners.
--
Jon Hanna
gt; character
> references as in:
> const wchar_t c = L'\U000309';
\u must be followed by four hexadecimal digits, \U by eight.
The biggest advantage of L'\u0309' over direct use of the combining character
is you can read the thing (source is intended for human read
"default
grapheme clusters" in Unicode. Functions which count either of these are
perfectly conformant with Unicode, as long as the perform their task correctly.
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for hospitals:
| <http://santa.boards.ie/>
x27;ve already gone past the stage where you can't taste it (I
understand heavily refrigerated beer is an American invention, and given the
way American beer tastes this makes sense), soon it'll be served to you on a
stick.
I can't even remember if this thread was ever on topi
> > You might as well say that C code is not plain text because it too is
> > subject to special canons of interpretation.
>
> C, C++ and Java source files are not plain text as well (they have their own
C, C++ and Java source files are plain text.
> "text/*" MIME type, which is NOT "text/plain"
, of these are
examples of conformant behaviour.
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for hospitals:
| <http://santa.boards.ie/>
lesheet. So it is not a problem that
> there is a defective combining sequence, nor that the accent is not
> combined with the e as it would be in NFC. Is that correct?
You can, whether you should is another thing, and whether it would render
correctly yet another.
--
Jon Hann
ined in China; he professed a belief that Guinness was
why the Irish had thicker bones than the Chinese in his experience. There are
considerably more doctors who would say that if you were going to drink a beer
it should be stout, without going so far as to actually recommend it in and of
itself.
can be performed through other means.
<http://www.w3.org/TR/charmod/benoit.svg> is an SVG example. This seems a
superior method for at least some of the use-cases cited anyway (I've missed
some of this thread though).
--
Jon Hanna | Toys and books
&
> SIL's involvement in Bible translation is not always widely advertised
> for various reasons: it is not the only work that SIL is involved in,
> not all SIL projects involve Bible translation, and in some countries in
> which SIL works the national agencies (government ministry,
> university,...)
more straight-forward implementation that stored
wchar_t characters).
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for sick children:
| <http://santa.boards.ie/>
ive property in section 4.2 of the
standard.
--
Jon Hanna | Toys and books
<http://www.hackcraft.net/> | for sick children:
| <http://santa.boards.ie/>
> Shouldn't it permit "assa" and "aßa" to co-exist? It isn't like ß is
> canonically equivalent to ss (if I read the file aright, it isn't even
> compatibility equivalent).
It is a case-insensitive system. If it is a case-insensitive system then one
should be able to safely treat Uppercase(x)
Quoting Philippe Verdy <[EMAIL PROTECTED]>:
> [EMAIL PROTECTED] wrote:
> > Further, a Unicode-aware algorithm would expect a choseong character to
> > be followed by a jungseong and a jongseong to follow a jungsong, and
> > could essentially perform the same benefits to compression that
> > nor
ion system will not be applicable to all uses. Of
course that answers the question "should we normalise?" with the
question "should we have a compression scheme that isn't universally
applicable?"
--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*
> The whole point of such a tool would be to send binary data on a transport
> that
> only allowed Unicode text. In practice, you'd also have to remap C0 and C1
> characters; but even then 0x00-0x1F -> U+0250-026F and 0x80-0x9F to
> U+0270-U+028F
> wouldn't be too complex. Unless you've added a Uni
> In all I would rather ban all defective sequences, by enforcing the W3C
> character model.
rect: by enforcing the use of full normalisation as defined in the W3C
character model.
Quoting Philippe Verdy <[EMAIL PROTECTED]>:
> Peter Kirk [mailto:[EMAIL PROTECTED] writes:
> > Why is this a problem? Quotes and ">" with combining marks are
> > presumably not legal HTML or XML;
>
> You're wrong: it is legal in both HTML and XML. What is not specified
> correctly is the behavio
> In the case of GIF versus JPG, which are usually regarded as "lossless"
> versus "lossy", please note that there /is/ no "orignal", in the sense
> of a stream of bytes. Why not? Because an image is not a stream of
> bytes. Period. What is being compressed here is a rectangular array of
> pixe
1 - 100 of 204 matches
Mail list logo