I missed this yesterday.
Plug Gulp wrote:
> General support for all characters, words and sentences could be
> achieved by just three new formatting characters, e.g. SCR, SUP and
> SUB, similar to the way other formatting characters such as ZWS, ZWJ,
> ZWNJ etc are defined. The new formatting
2015-12-16 19:16 GMT+01:00 Doug Ewell :
> The ones you suggest are stateful; they affect the rendering of
> arbitrary amounts of subsequent data, in a way reminiscent of ECMA-48
> ("ANSI") attribute switching, or ISO 2022 character-set switching.
> Unicode tries hard to avoid
Plug Gulp wrote:
> It will help if Unicode standard itself intrinsically supports
> generalised subscript/superscript text.
This falls outside the scope of "plain text" as defined by Unicode, in
much the same way as bold and italic styles and colors and font faces
and sizes.
There are several
Does the standard support the use of diacritics in plain text format, when used
with all and any complex scripts?
Regards
Sinnathurai
>
> On 15 December 2015 at 17:46 Doug Ewell wrote:
>
>
> Plug Gulp wrote:
>
> > It will help if Unicode standard itself
srivas sinnathurai wrote:
> Does the standard support the use of diacritics in plain text format,
> when used with all and any complex scripts?
It probably depends on what you mean by "support" and "diacritics." I
can type a Tamil letter followed by a combining acute accent or
diaeresis, and in
On Wed, Dec 9, 2015 at 5:18 AM, Martin J. Dürst wrote:
>
> I suggest using HTML:
>
> बक ्ष
>
This will work only if the end-users are always going to use a web
browser to view the text content.
It will help if Unicode standard itself intrinsically supports
generalised
On Tue, 15 Dec 2015 18:00:16 + (GMT)
srivas sinnathurai wrote:
> Does the standard support the use of diacritics in plain text format,
> when used with all and any complex scripts?
Relatively few scalar value sequences are prohibited - just possibly
sequences
On Tue, Dec 15, 2015 at 11:55:02AM +, Plug Gulp wrote:
> Please note that the teacher had to use a Circumflex Accent (Caret) to
> indicate superscript, which is an unwritten convention, in the absence
> of proper superscript support within Unicode.
If the teacher is explaining actual math to
On Wed, 9 Dec 2015 03:24:39 +
Plug Gulp wrote:
> I am trying to understand if there is a way to use Devanagari
> characters (and grapheme clusters) as subscript and/or superscript in
> unicode text.
Why do you want to do this? Are you asking about writing Devanagari
On Wed, 9 Dec 2015 03:24:39 +
Plug Gulp wrote:
> Hi,
>
> I am trying to understand if there is a way to use Devanagari
> characters (and grapheme clusters) as subscript and/or superscript in
> unicode text.
The view is that such would not be 'plain text', and therefore
Hello Plug,
I suggest using HTML:
बक ्ष
Regards, Martin.
On 2015/12/09 12:24, Plug Gulp wrote:
Hi,
I am trying to understand if there is a way to use Devanagari
characters (and grapheme clusters) as subscript and/or superscript in
unicode text. It will help if someone could please direct
The character U+0904 (DEVANAGARI LETTER SHORT A) is not a part of ISCII 91.
Neither was it encoded in any of the earlier versions of ISCII. Hence
according to the ISCII standard this character simply cannot be formed.
Aparna A. Kulkarni
-Original Message-
From: [EMAIL PROTECTED]
From: Aparna A. Kulkarni [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; 'Unicode List' [EMAIL PROTECTED]
Sent: Thursday, February 19, 2004 8:23 AM
Subject: RE: Devanagari Letter Short A
The character U+0904 (DEVANAGARI LETTER SHORT A) is not a part of ISCII 91.
Neither was it encoded in any
Philippe Verdy va escriure:
U+0904 DEVANAGARI LETTER SHORT A is used only for the case of an
independant vowel. It can be viewed as a conjunct of the
independant vowel U+0905 DEVANAGARI LETTER A and the dependant
vowel sign U+0946 DEVANAGARI VOWEL SIGN SHORT E (noted for
transcribing
Ernest Cline wrote:
I've been trying to make sense of the Indian scripts, but am
having one small difficulty. I can't seem to find the ISCII 1991
equivalent for U+0904 (DEVANAGARI LETTER SHORT A).
I do not believe you'll find it there.
U+0904 had been added to Unicode for version 4.0. In
My understanding of the Indian scripts coded in Unicode, is that the mapping
from ISCII to Unicode is not straightforward one-to-one, because ISCII uses a
contextual encoding for characters (allowing shifts between several scripts) and
some rich-text features.
The ISCII character model is not
I wrote:
I would have to disagree with these Indian experts in this instance.
The Devanagari glottal stop does not have a dot, and indeed, in the
languages which use it, this character will certainly coexist with
the question mark. They have different shapes, and different
functions.
At
I would have to disagree with these Indian experts in this instance.
The Devanagari glottal stop does not have a dot, and indeed, in the
languages which use it, this character will certainly coexist with
the question mark. They have different shapes, and different
functions.
--
Michael Everson
: Saturday, April 05, 2003 01:45
Subject: Re: Devanagari Glottal Stop
I would have to disagree with these Indian experts in this instance.
The Devanagari glottal stop does not have a dot, and indeed, in the
languages which use it, this character will certainly coexist with
the question mark
Vipul Garg wrote:
I have downloaded your font chart for Devanagari, which is in the range
from 0900 to 097F. I have also installed the Arial Unicode font supplied
by Microsoft office XP suite. I found that not all characters are
available for Devanagari. For example letters such as Aadha KA,
Vipal Garg was asking why half characters were not included in Unicode
code charts and in his copy of Arial Unicode font.
More recent versions of Arial Unicode Do contain half characters etc.
for Devanagari.
As to the code charts, to answer this, you needed to explore the Unicode
web site a bit
Vipul Garg wrote:
I have downloaded your font chart for Devanagari, which is in
the range from 0900 to 097F. I have also installed the Arial
Unicode font supplied by Microsoft office XP suite. I found
that not all characters are available for Devanagari. For
example letters such as Aadha
[EMAIL PROTECTED] scripsit:
Au contraire! You might find the attached gif of interest. (This is version
1.0 of the font. Some people might have earlier versions.)
Ah, excellent. It has not always been so.
If you're not getting Indic shaping with Arial Unicode MS, it's very likely
the fault
for my ready reference.
Best Regards,
Vipul Garg
Mind Axis (I) Solutions Pvt. Ltd.
Phone: +91 (22) 55994860 / 61
-Original Message-
From: John Cowan [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, December 03, 2002 5:33 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Devanagari
On Fri, 8 Mar 2002, Marco Cimarosti wrote:
Peter Constable wrote:
On 03/07/2002 02:16:10 PM James E. Agenbroad wrote:
A similar but not the same situation is found in the fourth
example in
figure 9-3 of Unicode 3.0 (page 214) where an intedpendent
vowel has the
reph (an
Peter Constable wrote:
On 03/07/2002 02:16:10 PM James E. Agenbroad wrote:
A similar but not the same situation is found in the fourth
example in
figure 9-3 of Unicode 3.0 (page 214) where an intedpendent
vowel has the
reph (an abridged form of a the consonant 'ra') above it. Unicode
At 15:36 -0600 07/03/2002, [EMAIL PROTECTED] wrote:
I may be wrong, but I believe that example has ra, halant, ra,
independent i . The first ra is the one that transforms into the reph.
You're wrong. RI in this case is a way of writing the vocalic r.
Compare Kr.s.n.a and Krishna.
--
Michael
At 15:16 -0500 07/03/2002, James E. Agenbroad wrote:
On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote:
On 03/06/2002 08:25:18 AM Michael Everson wrote:
[snip]
In
Cham, independent vowels can take dependent vowel signs. In
Devanagari, I guess that doesn't occur, but the
Using Apple's WorldText, I can confirm that short I did not reorder
correctly when preceded by 0294. But the 0294 glyph was in another
font.
I wonder could we see some samples of this in actual Limbu text?
--
Michael Everson *** Everson Typography *** http://www.evertype.com
At 11:26 +0100 2002-03-08, Marco Cimarosti wrote:
You are wrong, in fact, sorry. Although figure 9-3 does not show code point
values, both the glyphs and the abbreviated letter names make it clear that
the sequence is:
U+0930 (DEVANAGARI LETTER RA)
U+094D (DEVANAGARI SIGN VIRAMA)
On 03/08/2002 06:54:54 AM Michael Everson wrote:
Using Apple's WorldText, I can confirm that short I did not reorder
correctly when preceded by 0294. But the 0294 glyph was in another
font.
I wonder could we see some samples of this in actual Limbu text?
It's on its way.
- Peter
On 03/08/2002 05:09:46 AM Michael Everson wrote:
At 15:36 -0600 07/03/2002, [EMAIL PROTECTED] wrote:
I may be wrong, but I believe that example has ra, halant, ra,
independent i . The first ra is the one that transforms into the reph.
You're wrong. RI in this case is a way of writing the
Jim Agenbroad responded (off list):
Not quite. On page 214 of 3.0 there is one RA vowel, a halant and a
RI
vowel: RA(d) + RI(n) -- RI(n) +RA(sup) ( parens in lieu ofsubscript)
I didn't realise that RI meant the vocalic R. I mistook it to mean
something else. I find it a weakness of that
[EMAIL PROTECTED] scripsit:
I didn't realise that RI meant the vocalic R.
It reflects the modern Hindi pronunciation of Skt /r=/.
--
John Cowan [EMAIL PROTECTED] http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,http://www.ccil.org/~cowan
han mathon ne chae, a han
: Wednesday, 6 March 2002 12:14
To: Yaap Raaf
Cc: [EMAIL PROTECTED]
Subject: Re: Devanagari enthousiasm!
On 06-03-2002 04:29:20 PM Yaap Raaf wrote:
At 14:02 +0100 2002.03.06, [EMAIL PROTECTED] wrote:
I am on a Mac and can't open it,
Well, this is going to be a problem for non-Windows clients, I
At 10:29 -0600 2002-03-08, [EMAIL PROTECTED] wrote:
Jim Agenbroad responded (off list):
Not quite. On page 214 of 3.0 there is one RA vowel, a halant and a
RI
vowel: RA(d) + RI(n) -- RI(n) +RA(sup) ( parens in lieu ofsubscript)
I didn't realise that RI meant the vocalic R. I mistook it
On Fri, 8 Mar 2002, Michael Everson wrote:
At 15:16 -0500 07/03/2002, James E. Agenbroad wrote:
On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote:
On 03/06/2002 08:25:18 AM Michael Everson wrote:
[snip]
In
Cham, independent vowels can take dependent vowel signs. In
On Fri, 8 Mar 2002 [EMAIL PROTECTED] wrote:
Jim Agenbroad responded (off list):
Not quite. On page 214 of 3.0 there is one RA vowel, a halant and a
RI
vowel: RA(d) + RI(n) -- RI(n) +RA(sup) ( parens in lieu ofsubscript)
I didn't realise that RI meant the vocalic R. I mistook it
implementations might
not recognise a sequence like consonant, vowel, nukta as
valid. For
instance, I understand that if Uniscribe encountered such a
sequence, it
would assume you've left out a consonant immediately before
the nukta,
and it would display a dotted circle to
On 03/06/2002 03:12:20 PM Michael Everson wrote:
But a font is not a ISO/IEC 10646 subset! By definition, it contains
glyph
codes, not character codes. They are in two different worlds.
But in public procurement a subset may be specified, in which case
ASCII will be implied. I don't know who
That behaviour, IMHO, is incorrect. There is no, and was never
any kind of grapheme or even combining sequence break
at that point, and there should never be a dotted circle
displayed through that sequence of characters (a show-
individual-characters mode should of course be excepted).
I agree.
On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote:
On 03/06/2002 08:25:18 AM Michael Everson wrote:
[snip]
In
Cham, independent vowels can take dependent vowel signs. In
Devanagari, I guess that doesn't occur, but the Brahmic model
shouldn't be understood to preclude this
On 03/07/2002 02:16:10 PM James E. Agenbroad wrote:
A similar but not the same situation is found in the fourth example in
figure 9-3 of Unicode 3.0 (page 214) where an intedpendent vowel has the
reph (an abridged form of a the consonant 'ra') above it. Unicode
wants
this encoded as consonant
Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]
- Forwarded by Peter Constable/IntlAdmin/WCT on 03/07/2002 10:10 PM
-
Jeff LB Webster [EMAIL PROTECTED]
03/07/2002 08:59 PM
To: [EMAIL PROTECTED]
cc:
Subject:Re
On 06-03-2002 09:59:48 Yaap Raaf wrote:
Win98: You need something called Opentype Devanagri fonts
to VIEW the Hindi unicode text.
You can get a good font for free from BBC Hindi site.
Except that the license that accompanies the font says:
COPYRIGHT AND ALL OTHER RIGHT, TITLE AND INTEREST
On Wednesday, March 6, 2002, at 03:24 AM, Herman Ranes wrote:
There is a related problem in connection with Norwegian typography: Most
fonts include the 'fi' and 'ffi' ligatures, but I have never heard of a
commercial font which includes the 'fj' ligature.
Apple's Hoeffler font contains
At 00:12 -0600 2002-06-03, [EMAIL PROTECTED] wrote:
(1) The first problem is the need for a glottal character for Limbu (ie,
Limbu language written in Devanagri script, as opposed to Limbu script,
which already has a symbol for glottal). The Limbu language committee has
decided that this
* Herman Ranes [EMAIL PROTECTED] [2002-03-06 11:24]:
There is a related problem in connection with Norwegian typography:
Most fonts include the 'fi' and 'ffi' ligatures, but I have never
heard of a commercial font which includes the 'fj' ligature.
From the Adobe OpenType user guide:
At 14:02 +0100 2002.03.06, [EMAIL PROTECTED] wrote:
I interpret this to mean one may not legitimately use this font for any
purpose other than viewing the BBC website.
If http://www.bbc.co.uk/hindi/images/download_text.gif
is any indication, the font doesn't look too promising.
Have you seen
At 17:29 +0100 2002-06-03, Yaap Raaf wrote:
There was another message announcing Raghu font.
Subject: Free Unicode Hindi fonts
From:Dakshin Shantakumar [EMAIL PROTECTED]
Newsgroups: alt.language.hindi soc.culture.indian
Date:2 Mar 2002 13:51:45 -0800
Downloadable here
On 03/06/2002 04:24:54 AM Herman Ranes wrote:
There is a related problem in connection with Norwegian typography:
Most fonts include the 'fi' and 'ffi' ligatures, but I have never
heard of a commercial font which includes the 'fj' ligature.
That's quite a different problem. All it would take to
At 02:24 3/6/2002, Herman Ranes wrote:
There is a related problem in connection with Norwegian typography: Most
fonts include the 'fi' and 'ffi' ligatures, but I have never heard of a
commercial font which includes the 'fj' ligature.
Using such a font, the word 'fire' (four) would be ligated
At 08:29 3/6/2002, Yaap Raaf wrote:
There was another message announcing Raghu font.
Subject: Free Unicode Hindi fonts
From:Dakshin Shantakumar [EMAIL PROTECTED]
Newsgroups: alt.language.hindi soc.culture.indian
Date:2 Mar 2002 13:51:45 -0800
Downloadable here
At 11:03 -0800 2002-06-03, John Hudson wrote:
It has about 600 glyphs. But no Latin letters, which, IIRC,
disqualifies it as a real Unicode font?
No, a Unicode font does not need to contain Latin letters.
A valid ISO/IEC 10646 subset must contain ASCII.
--
Michael Everson *** Everson
At 11:03 -0800 2002-06-03, John Hudson wrote:
No, a Unicode font does not need to contain Latin letters.
And Michael Everson responded:
A valid ISO/IEC 10646 subset must contain ASCII.
But a font is not a ISO/IEC 10646 subset! By definition, it contains glyph
codes, not character codes.
On 06-03-2002 04:29:20 PM Yaap Raaf wrote:
At 14:02 +0100 2002.03.06, [EMAIL PROTECTED] wrote:
I am on a Mac and can't open it,
Well, this is going to be a problem for non-Windows clients, I admit.
it's a
244K .exe Why an .exe?
I don't know if this is what the BBC was trying to do, but
Michael Everson scripsit:
A valid ISO/IEC 10646 subset must contain ASCII.
But a 10646 subset is a coded character set, not a font.
--
John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all. There
are
At 12:07 -0800 2002-06-03, Rick McGowan wrote:
At 11:03 -0800 2002-06-03, John Hudson wrote:
No, a Unicode font does not need to contain Latin letters.
And Michael Everson responded:
A valid ISO/IEC 10646 subset must contain ASCII.
But a font is not a ISO/IEC 10646 subset! By definition,
Michael Everson said:
No, a Unicode font does not need to contain Latin letters.
A valid ISO/IEC 10646 subset must contain ASCII.
Besides others pointing out the obvious disconnect
between 10646 subsets and what can be in a valid
Unicode font (which contains glyphs, not characters),
this
On 03/06/2002 08:25:18 AM Michael Everson wrote:
That almost answers my first question. Does Devanagari glottal have
an inherent vowel? If it does, encode a new character.
That seems like a very good metric to consider, and I hadn't thought of it
myself. I'd expect that this can be used
On 02/23/2002 08:58:28 PM yaapraaf wrote:
Now I have one more question, related with the next:
# Subject: Re: How to create Unicode input methods for MacOS? (long)
# Our 'uchr' resources are created using an assembler. It's the
# only tool we are aware of that can fill in the offsets the
On 02/22/2002 05:26:53 PM Yaap Raaf wrote:
I've also been looking at Tavultesoft's Keyman, but there are no
readymade keyboards available for the purpose. I don't know
how complicated it is to develop one.
For simple behaviours, it can be quite easy; e.g. if you just need to
assign Devanagari
At 15:08 +0100 2002.02.23, [EMAIL PROTECTED] wrote:
On 02/22/2002 05:26:53 PM Yaap Raaf wrote:
I've also been looking at Tavultesoft's Keyman, but there are no
readymade keyboards available for the purpose. I don't know
how complicated it is to develop one.
For simple behaviours, it can be
David Starner wrote:
On Mon, Jan 21, 2002 at 02:20:17PM +0100, Marco Cimarosti wrote:
What this means in practice for website developers is:
1) SCSU text can only be edited with a text editor which
properly decodes
the *whole* file on load and re-encodes it on save. On the
other
ge-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]OnBehalf Of Aman ChawlaSent: 21 January 2002 10:57To: James Kass; UnicodeSubject: Re: Devanagari- Original Message -From: "James Kass" [EMAIL PROTECTED]To: "Aman Chawla" [EMAIL PROTECTED]; "Unicode"
At 23:19 -0600 2002-01-20, David Starner wrote:
There is no simple encoding scheme that will encode Indic text in
Unicode in one byte per character.
Raw 32-bit encoding treats all characters equally, doesn't it? :-)
--
Michael Everson *** Everson Typography *** http://www.evertype.com
At 00:39 -0500 2002-01-21, Aman Chawla wrote:
The issue was originally brought up to gather opinion from members of this
list as to whether UTF-8 or ISCII should be used for creating Devanagari web
pages. The point is not to criticise Unicode but to gather opinions of
informed persons (list
* [EMAIL PROTECTED]
|
| This is why I really wish that SCSU were considered a truly
| standard encoding scheme. Even among the Unicode cognoscenti it
| is usually accompanied by disclaimers about private agreement only
| and not suitable for use on the Internet, where the former claim
| is
Aman,
What is it you want? To complain about the architecture of Unicode
and UTF-8? For good or ill, it isn't going to change. Neither was it
a conspiracy to suppress the non-English-speaking peoples of the
world.
--
Michael Everson *** Everson Typography *** http://www.evertype.com
Aman Chawla wrote,
With regards to South Asia, where the most widely used modems are approx. 14
kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostly
unheard of, efficiency in data transmission is of paramount importance...
how can we convince the south asian user to
On Sun, 20 Jan 2002 23:57:29 -0500
Aman Chawla [EMAIL PROTECTED] wrote:
With regards to South Asia, where the most widely used modems are approx. 14
kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostly
unheard of, efficiency in data transmission is of paramount
Doug Ewell wrote:
Devanagari text encoded in SCSU occupies exactly 1 byte per
character, plus an additional byte near the start of the
file to set the current window (0x14 = SC4).
The problem is what happens if that very byte gets corrupted for any
reason...
If an octet is erroneously
In a message dated 2002-01-21 1:33:23 Pacific Standard Time,
[EMAIL PROTECTED] writes:
Do you know of any published web pages that use SCSU? I think that's
probably the place to start. I never add support for encodings I can't
find in actual use on the web. (Hint hint. :)
This becomes a
In a message dated 2002-01-21 5:20:55 Pacific Standard Time,
[EMAIL PROTECTED] writes:
Doug Ewell wrote:
Devanagari text encoded in SCSU occupies exactly 1 byte per
character, plus an additional byte near the start of the
file to set the current window (0x14 = SC4).
The problem is what
; Unicode
Subject: Re: Devanagari
- Original Message -
From: James Kass [EMAIL PROTECTED]
To: Aman Chawla [EMAIL PROTECTED]; Unicode
[EMAIL PROTECTED]
Sent: Monday, January 21, 2002 12:46 AM
Subject: Re: Devanagari
25% may not be 300%, but it isn't insignificant. As you note
.
- Chris
--
Christopher J Fynn
Thimphu, Bhutan
[EMAIL PROTECTED]
[EMAIL PROTECTED]
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Aman Chawla
Sent: 21 January 2002 10:57
To: James Kass; Unicode
Subject: Re: Devanagari
- Original Message
Aman Chawla wrote,
I would be grateful if I could get opinions on the following:
1. Which encoding/character set is most suitable for using Hindi/Marathi
(both of which use Devanagari) on the internet as well as in databases, and
why? In your response, please refer to:
At 11:22 -0500 2002-01-20, Aman Chawla wrote:
I am unable to find the Devanagari Rupee sign encoded in Unicode? Is
it encoded? If not, why?
U+20A8.
--
Michael Everson *** Everson Typography *** http://www.evertype.com
At 12:48 AM 1/20/02 -0800, James Kass wrote:
The arguments about relative size are true, but in this day and age are
considered unimportant. Graphics files are extremely large in comparison
with text files of any script and so are sound files. Devanagari UTF-8 is
three bytes. The four byte
The fact that UTF-8 economizes on the storage for ASCII characters, is a
benefit for *all* HTML users, as the HTML syntax is entirely in ASCII and
claims a significant fraction of the data.
A UTF-8 encoded HTML file, will therefore have (percentage-wise) less
overhead
for Devanagari as
In a message dated 2002-01-20 16:49:17 Pacific Standard Time,
[EMAIL PROTECTED] writes:
The point was that a UTF-8 encoded HTML file for an English web page
carrying say 10 gifs would have a file size one-third that for a Devanagari
web page with the same no. of gifs...
Therefore
On Sun, Jan 20, 2002 at 07:39:57PM -0500, Aman Chawla wrote:
: The point was that a UTF-8 encoded HTML file for an English web page
: carrying say 10 gifs would have a file size one-third that for a Devanagari
: web page with the same no. of gifs - even if you take into account the
: fluctuation
On Sun, Jan 20, 2002 at 07:39:57PM -0500, Aman Chawla wrote:
The point was that a UTF-8 encoded HTML file for an English web page
carrying say 10 gifs would have a file size one-third that for a Devanagari
web page with the same no. of gifs
The point is, that the text for a short webpage is
Doug Ewell wrote,
I think before worrying about the performance and storage effect on Web pages
due to UTF-8, it might help to do some profiling and see what the actual
impact is.
The What is Unicode? pages offer a quick study.
14808 bytes (English)
15218 bytes (Hindi)
10808 bytes
At 10:44 PM 1/20/2002 -0500, you wrote:
Taking the extra links into account the sizes are:
English: 10.4 Kb
Devanagari: 15.0 Kb
Thus the Dev. page is 1.44 times the Eng. page. For sites providing archives
of documents/manuscripts (in plain text) in Devanagari, this factor could be
as high as
- Original Message -
From: James Kass [EMAIL PROTECTED]
To: Aman Chawla [EMAIL PROTECTED]; Unicode
[EMAIL PROTECTED]
Sent: Monday, January 21, 2002 12:46 AM
Subject: Re: Devanagari
25% may not be 300%, but it isn't insignificant. As you note, if the
mark-up were removed from both
On Sun, Jan 20, 2002 at 10:44:00PM -0500, Aman Chawla wrote:
For sites providing archives
of documents/manuscripts (in plain text) in Devanagari, this factor could be
as high as approx. 3 using UTF-8 and around 1 using ISCII.
Uncompressed, yes. It shouldn't be nearly as bad compressed - gzip,
- Original
Message -From: "David Starner" [EMAIL PROTECTED]To: "Aman Chawla" [EMAIL PROTECTED]Cc: "James Kass" [EMAIL PROTECTED]; "Unicode"[EMAIL PROTECTED]Sent: Monday, January 21, 2002 12:19
AMSubject: Re: Devanagari What's your point
On Sun, 20 Jan 2002, Aman Chawla wrote:
Taking the extra links into account the sizes are:
English: 10.4 Kb
Devanagari: 15.0 Kb
Thus the Dev. page is 1.44 times the Eng. page. For sites providing archives
of documents/manuscripts (in plain text) in Devanagari, this factor could be
as high
In a message dated 2002-01-20 20:49:00 Pacific Standard Time,
[EMAIL PROTECTED] writes:
Usually, when someone offers
a large body of plain text in any script, files are compressed
in one way or another in order to speed up downloads.
This is why I really wish that SCSU were considered a
In a message dated 2002-01-20 21:49:02 Pacific Standard Time,
[EMAIL PROTECTED] writes:
The issue was originally brought up to gather opinion from members of this
list as to whether UTF-8 or ISCII should be used for creating Devanagari web
pages. The point is not to criticise Unicode but to
On Mon, Jan 21, 2002 at 12:57:39AM -0500, [EMAIL PROTECTED] wrote:
This is why I really wish that SCSU were considered a truly standard
encoding scheme. Even among the Unicode cognoscenti it is usually
accompanied by disclaimers about private agreement only and not suitable
for use on the
From: Rick McGowan [mailto:[EMAIL PROTECTED]]
Mike Ayers wrote:
The last I knew,
computer-savvy Taiwan and Hong Kong were continuing to invent new
characters. In the end, the onus is on the computer to
support the user.
Yes, the computer should support the user, but... The
Mark Davis wrote:
The Unicode Standard does define the rendering of such combinations, which
is in the absence of any other information to stack outwards.
A dumb implementation would simply move
the accent outwards if there was in the same position. This will not
necessarily produce an
From: D.V. Henkel-Wallace [mailto:[EMAIL PROTECTED]]
At 06:30 2000-11-14 -0800, Marco Cimarosti wrote:
But my point was: not even Mr. Ethnologue himself knows
exactly *which*
combinations are meaningful, in all orthographic system.
And, clearly, no
one can figure out which combinations
Mike Ayers wrote:
The last I knew,
computer-savvy Taiwan and Hong Kong were continuing to invent new
characters. In the end, the onus is on the computer to support the user.
Yes, the computer should support the user, but... The invention of new characters to
serve multitudes is OK, and
On Tue, 14 Nov 2000, Rick McGowan wrote:
Mike Ayers wrote:
The last I knew,
computer-savvy Taiwan and Hong Kong were continuing to invent new
characters. In the end, the onus is on the computer to support the user.
Yes, the computer should support the user, but... The invention of new
Antoine Leca wrote:
My understanding is that there are a number of similar cases,
which are not
officially prohibited (AFAIK), but does not carry any sense.
For example, how about digits followed by accents (as
combining marks)?
Or the kana voicing/voiceless combining marks, when they
Monday, November 13, 2000 10:11
Subject: Re: Devanagari question
Marco Cimarosti wrote:
Antoine Leca wrote:
My understanding is that there are a number of similar cases,
which are not
officially prohibited (AFAIK), but does not carry any sense.
I think that the original idea beh
On Wed, 8 Nov 2000, Apurva Joshi wrote:
The RA[sup] is seen applied to the independent vowel Vocalic R (U+ 090B) in
printed samples in Sanskrit.
There are atleast the following words that contain the above:
NaiRiTa (the name of a demon)
= 0928 090B Ra[sup] 0924
NaiRiTi (the goddess
1 - 100 of 103 matches
Mail list logo