Re: Unix Codes for Diacritics

Krishna Birth Mon, 20 Sep 2010 15:54:39 -0700

Hi

Would you be able to deal with Xmodmap project -
http://www.unicode.org/mail-arch/unicode-ml/y2010-m09/0042.html


Best,


Meeकu

On Sat, Sep 18, 2010 at 10:42 AM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:

> On Sat, 18 Sep 2010 00:06:07 +0100
> Krishna Birth <krishnabi...@gmail.com> wrote:
>
> > Could someone please correctly tell the codes to use on Unix operating
> > systems to produce the below diacritics:
> >
> > A
> > Ā = http://www.fileformat.info/info/unicode/char/0100/index.htm
> ...
>
> > I need to find this for a project/coder's question?
>
> If you are asking how to type these precomposed letters at a keyboard,
> we need to know which Unix operating system you have in mind, and the
> X-terminal model may be relevant.  For example, if the X-terminal is
> a Windows PC running Exceed, this may reduce to a Windows question.
>
> My answer is directed to what one would write in a program.  It is
> possible that more detail is required as to the coder's problem.
>
> The codepoint (i.e. number encoding the character) for these letters
> is part of the name of the links you gave, e.g. the code for Ā is 0100
> in hex.
>
> If you are simply trying to produce the single, precomposed character
> in a program, the information is given in the table headed 'Encodings'
> in the pages you referenced.  It may be worth also giving the
> information for the plain letter 'A' at
> http://www.fileformat.info/info/unicode/char/0041/index.htm so that the
> coder may understand the information better.  UTF-8 is the encoding
> which for most purposes can work on Unix in exactly the same fashion as
> 8-bit codes (ASCII, ISO-8859, ISCII, TSCII), though multibyte EUC
> encodings are a better analogy.  (If the coder doesn't understand EUC,
> it's not worth explaining.)
>
> For example, when I run a terminal window using the locale en_GB.utf8,
> I can have the letter printed to the terminal by a bash script using
> the command
> % printf "\xc4\x80" # Use UTF-8 form explicitly
> The printf of bash version 4.1.5(1) does not understand escape codes
> using '\u'.
>
> On the other hand, /usr/bin/printf on the Linux system I'm using does,
> and I could achieve the same effect using
> % /usr/bin/printf "\u0100" # What happens in non-UTF-8 locales?
>
> If you want the codes for the diacritics themselves, so that the
> letters you listed may be entered as plain Roman letter plus diacritic
> mark, the information you need
> is in http://www.unicode.org/Public/UNIDATA/UnicodeData.txt , with an
> explanation in http://www.unicode.org/reports/tr44/#UnicodeData.txt .
> As an example, consider the line for U+0100:
>
> 0100;LATIN CAPITAL LETTER A WITH MACRON;Lu;0;L;0041 0304;;;;N;LATIN
> CAPITAL LETTER A MACRON;;;0101;
>
> The data items are separated by semicolons.  The first two are the
> codepoint, the number for the character, expressed in hecadecimal
> notation.  The second field gives the character name.  The interesting
> field for you may be the sixth field, which, unless it starts with
> '<', gives another way of expressing the same character - in this case
> as the sequence of <U+0041 LATIN CAPITAL LETTER A WITH MACRON, U+0304
> COMBINING MACRON>.
>
> If you want to write the diacritics themselves without attaching them
> to a letter, there are two or three methods.  Firstly, you can
> write them on a hardspace, e.g. <U+00A0 NO-BREAK SPACE, U+0304>.  This
> will not always work; using the spacing modifier letter is the safe way
> of writing it.  For this you need to look at their code chart.  For the
> macron, you will use <U+02C9 MODIFIER LETTER MACRON>.  The third
> method is to use the ISO-8859 characters, in this case <U+00AF
> MACRON>.  The drawback with the third method is that this is a symbol,
> not a letter, and you may encounter bad line-breaking or the macron may
> be combined with a preceding letter.
>
> Richard.
>
>
>

Re: Unix Codes for Diacritics

Reply via email to