Kent Karlsson kentk at md dot chalmers dot se wrote:
Consider English. If I write , that may well be a spell error.
Or even Ŋŋŋŋ!, as Michael Everson wrote in WG2 N2306.
-Doug Ewell
Fullerton, California
On Wed, 05 Feb 2003 02:00:30 -0800 (PST), [EMAIL PROTECTED] wrote:
If these alternate forms were needed to be displayed in a single
multi-lingual plain-text file, wouldn't we need some method of
tagging the runs of Latin text for their specific languages?
Is this not what the variation
- Original Message -
From: SRIDHARAN Aravind [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, February 05, 2003 8:27 AM
Subject: How to convert special characters into unicode?
How to get unicode values for special characters in Java?
I have a set of Czech special characters?
* [EMAIL PROTECTED]
|
| Please forgive me and others who are on similar set-ups if this is
| all just too much of a pain!
It is hard for people to avoid giving others two copies of replies to
on-list messages. In my case I've solved this, since my email client
(Gnus) detects duplicate messages
SRIDHARAN Aravind ASridharan at covansys dot com wrote:
I have Czech special characters in an excel file.
I copy them into Notepad.
I save them.
Now I use native2ascii convertor that is available with JDK.
After I run this utility, I am getting some other unicode values or
sometimes only
On 02/04/2003 02:52:25 PM jameskass wrote:
If these alternate forms were needed to be displayed in a single
multi-lingual plain-text file, wouldn't we need some method of
tagging the runs of Latin text for their specific languages?
The plain-text file would be legible without that -- I don't
Hello, all.
I'm dealing with an API that claims it doesn't support unicode characters with
embedded nulls.
I'm trying to figure out how much of a liability this is.
What is my best plan of attack for discovering precisely which code points have
embedded nulls
given a particular encoding?
On 02/05/2003 04:05:44 AM Andrew C. West wrote:
If these alternate forms were needed to be displayed in a single
multi-lingual plain-text file, wouldn't we need some method of
tagging the runs of Latin text for their specific languages?
Is this not what the variation selectors are available
.
Andrew C. West wrote,
Is this not what the variation selectors are available for ?
And now that we soon to have 256 of them, perhaps Unicode ought not to be shy
about using them for characters other than mathematical symbols.
Yes, there seem to be additional variation selectors coming in
.
Peter Constable wrote,
The plain-text file would be legible without that -- I don't think this is
an argument in favour of plane 14 tag characters. Preserving
culturally-preferred appearance would certainly require markup of some
form, whether lang IDs or for font-face and perhaps
Are you sure the API doesn't support Unicode _characters_ with embedded
NULs? Or does it fail to support Unicode _strings_ with embedded NULs?
If it really is the former, no character in UTF-8 (except, of course,
U+) will include a NUL byte. In UTF-16, it will be any character of the
form
Erik Ostermueller wrote:
I'm dealing with an API that claims it doesn't support
unicode characters with embedded nulls.
I'm trying to figure out how much of a liability this is.
If by embedded nulls they mean bytes of value zero, that library can
*only* work with UTF-8. The other two UTF's
At 06:24 PM 2/5/03 +, [EMAIL PROTECTED] wrote:
The advantages of using P14 tags (...equals lang IDs mark-up) is
that runs of text could be tagged *in a standard fashion* and
preserved in plain-text.
The minute you have scoped tagging, you are no longer using
plain text.
The P14 tags are no
On 02/05/2003 12:24:39 PM jameskass wrote:
The advantages of using P14 tags (...equals lang IDs mark-up) is
that runs of text could be tagged *in a standard fashion* and
preserved in plain-text.
Sure, but why do we want to place so much demand on plain text when the
vast majority of content we
Erik followed up:
From what I'm hearing from you all is that a null
in UTF-8 is for termination and termination only.
Is this correct?
Not quite. A null byte (0x00) in UTF-8 is only a
representation of the NULL character (U+). It can
be present in UTF-8 for whatever purposes one might
At 16:47 -0500 2003-02-05, Jim Allan wrote:
There are often conflicting orthographic usages within a language.
Language tagging alone does not indicate whether German text is to
be rendered in Roman or Fraktur, whether Gaelic text is to be
rendered in Roman or Uncial, and if Uncial, a modern
.
Asmus Freytag wrote,
Variation selectors also can be ignored based on their code
point values, but unlike p14 tags, they don't become invalid
when text is cutpaste from the middle of a string.
Excellent point.
Unicode 4.0 will be quite specific: P14 tags are reserved for
use with
.
Peter Constable wrote,
Sure, but why do we want to place so much demand on plain text when the
vast majority of content we interchange is in some form of marked-up or
rich text? Let's let plain text be that -- plain -- and look to the markup
conventions that we've invested so much in and
I don't know about others, but my filters place messages in different folders,
EXCEPT when my name is on the cc or to list.
In that case, the message is left in my inbox for more immediate review and
possible response.
The Unicode lists are also slow to send mail so there can be a significant
You mean like this?
The following is two times the zodiac [ U+2648 ... U+2653 ]
Mortbats Zodiac:
1234567890-=
[ Needs Mortbats font to display, http://www.dingbatpages.com]
Unicode Zodiac:
♈♉♊♋♌♍♎♏♐♑♒♓
[ Needs e.g. Arial Unicode MS to
display ]
The upper of these two zodiacs will give wrong
20 matches
Mail list logo