On Sat, 18 Sep 2010 00:06:07 +0100 Krishna Birth <krishnabi...@gmail.com> wrote:
> Could someone please correctly tell the codes to use on Unix operating > systems to produce the below diacritics: > > A > Ā = http://www.fileformat.info/info/unicode/char/0100/index.htm ... > I need to find this for a project/coder's question? If you are asking how to type these precomposed letters at a keyboard, we need to know which Unix operating system you have in mind, and the X-terminal model may be relevant. For example, if the X-terminal is a Windows PC running Exceed, this may reduce to a Windows question. My answer is directed to what one would write in a program. It is possible that more detail is required as to the coder's problem. The codepoint (i.e. number encoding the character) for these letters is part of the name of the links you gave, e.g. the code for Ā is 0100 in hex. If you are simply trying to produce the single, precomposed character in a program, the information is given in the table headed 'Encodings' in the pages you referenced. It may be worth also giving the information for the plain letter 'A' at http://www.fileformat.info/info/unicode/char/0041/index.htm so that the coder may understand the information better. UTF-8 is the encoding which for most purposes can work on Unix in exactly the same fashion as 8-bit codes (ASCII, ISO-8859, ISCII, TSCII), though multibyte EUC encodings are a better analogy. (If the coder doesn't understand EUC, it's not worth explaining.) For example, when I run a terminal window using the locale en_GB.utf8, I can have the letter printed to the terminal by a bash script using the command % printf "\xc4\x80" # Use UTF-8 form explicitly The printf of bash version 4.1.5(1) does not understand escape codes using '\u'. On the other hand, /usr/bin/printf on the Linux system I'm using does, and I could achieve the same effect using % /usr/bin/printf "\u0100" # What happens in non-UTF-8 locales? If you want the codes for the diacritics themselves, so that the letters you listed may be entered as plain Roman letter plus diacritic mark, the information you need is in http://www.unicode.org/Public/UNIDATA/UnicodeData.txt , with an explanation in http://www.unicode.org/reports/tr44/#UnicodeData.txt . As an example, consider the line for U+0100: 0100;LATIN CAPITAL LETTER A WITH MACRON;Lu;0;L;0041 0304;;;;N;LATIN CAPITAL LETTER A MACRON;;;0101; The data items are separated by semicolons. The first two are the codepoint, the number for the character, expressed in hecadecimal notation. The second field gives the character name. The interesting field for you may be the sixth field, which, unless it starts with '<', gives another way of expressing the same character - in this case as the sequence of <U+0041 LATIN CAPITAL LETTER A WITH MACRON, U+0304 COMBINING MACRON>. If you want to write the diacritics themselves without attaching them to a letter, there are two or three methods. Firstly, you can write them on a hardspace, e.g. <U+00A0 NO-BREAK SPACE, U+0304>. This will not always work; using the spacing modifier letter is the safe way of writing it. For this you need to look at their code chart. For the macron, you will use <U+02C9 MODIFIER LETTER MACRON>. The third method is to use the ISO-8859 characters, in this case <U+00AF MACRON>. The drawback with the third method is that this is a symbol, not a letter, and you may encounter bad line-breaking or the macron may be combined with a preceding letter. Richard.