Zdeněk,

Thank you very much. Very useful, and you confirm my suspect
the data in the CLDR is not always reliable. Furthermore, it's
obvious it's intended mainly for displaying plain text in
some especific contexts and not for fine typesetting. At first
my idea was to sinchronize more or less regularly the ini files
with the CLDR, but now I'm not sure it's a good idea.

I do not understand the meaning of the encoding field.

The goal is to provide information about which encodings
support or have supported the language, even partially
(definitely, one couldn't say OT1 supports any language
except English and a few others). This field is essentially
informative.

I understand hyphenchar (should be the same as in English in all mentioned
languages) but do not understand the other hyphen* fields.

Most of them are intended for luatex (only for the languages
they make sense, of course).

Javier

------------------------------------------------

The minus sign in both Czech and Slovak should be –

The quotes in both Czech and Slovak are „ and “ (the closing quote has its
codepoint in Unicode but is rarely present in fonts, it is better to use
English opening quote which has the same shape).

In Czech (and maybe also in Slovak) the time separator is a period, in
sport results and time tables a colon is used.

Slovak: characters Ä Ď Ô Ť in index look strange to me, it should be proved
by a native Slovak speaker.

Hindi
====

See the note on the encoding above

A few misprints and missing items in the captions
bib = संदर्भ-ग्रन्थ (or संदर्भ-ग्रंथ)
contents - the version you have is one of the alternatives suggested by
Anshuman Pandey but most books I have bought in India contain अनुक्रम
part = खण्ड (or खंड)
page = पृष्ठ
proof = प्रमाण
glossary = शब्दार्थ सूची

cc, encl, and headto make no sense, I am probably the only man who writes
business e-mails in Hindi...

I have never seen abreviated months (a native Hindi speaker should help).
The only abbreviations for days of week I have seen at the Aligarh railway
station are:
Monday = सो॰, Tuesday = मं॰, Wednesday = बु॰, Thursday = बृह॰, Friday = शुक॰
(or शुक्र॰, the plate was not clearly readable), Saturday = शनि॰, Sunday =
रवि॰. I would not be surprized if the ॰ punctuation were omitted.

[characters] ङ  and ञ are not used in Hindi, they should be removed from index

frenchspacing – I am afraid that it has no sense in Hindi as well as other
Indic languages. The proper spacing was implemented in GNU Freefont (at
least for Hindi) and is activated automatically by language switching. The
rules are explained (in Hindi only, links to other languages switch to a
different text) at
https://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%B9%E0%A4%BF%E0%A4%A8%E0%A5%8D%E0%A4%A6%E0%A5%80_%E0%A4%AE%E0%A5%87%E0%A4%82_%E0%A4%B8%E0%A4%BE%E0%A4%AE%E0%A4%BE%E0%A4%A8%E0%A5%8D%E0%A4%AF_%E0%A4%97%E0%A4%B2%E0%A4%A4%E0%A4%BF%E0%A4%AF%E0%A4%BE%E0%A4%81

punctuation: danda । and double danda ॥ should be listed as the most
important punctuation
quotes: either English double quotes or English single quotes are used
(depends on the preference of an author and/or a publisher)

number: Both Devanagari and Arabic digits are used, it is hard to say which
one should be he default

counters: the way how list items are numbered does not conform to the LaTeX
system. I have a normative document how it should be done, it is written in
Marathi and I probably have also a Hindi version. Unfortunately I have not
found time to implement it so far.



Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz

2016-03-23 19:31 GMT+01:00 Javier Bezos <lis...@tex-tipografia.com
<mailto:lis...@tex-tipografia.com>>:

    Hi all,

    I'm working on a new version of babel, with a new way to define
    languages in a descriptive way, more than in a programmatic one (of
    course, the latter won't be excluded because it's still necessary).

    The idea is to create a set of ini file like those you can find on

    
https://latex-project.org/svnroot/latex2e-public/trunk/required/babel/locales/

    They are tentative and some of them are incomplete. I'm working on the
    code to read and 'transform' their data, but in the meanwhile I'd like
    to improve the ini files. The first step in the roadmap is to provide
    real utf-8 strings for captions and dates with current styles so
    that they can be useable even without fontenc.

    Any help or comments would be greatly appreciated.

    [Crossposted to xetex and luatex lists.]

    Javier


    --------------------------------------------------
    Subscriptions, Archive, and List information, etc.:
    http://tug.org/mailman/listinfo/xetex






--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
   http://tug.org/mailman/listinfo/xetex




--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Reply via email to