Zdeněk, Thank you very much. Very useful, and you confirm my suspect the data in the CLDR is not always reliable. Furthermore, it's obvious it's intended mainly for displaying plain text in some especific contexts and not for fine typesetting. At first my idea was to sinchronize more or less regularly the ini files with the CLDR, but now I'm not sure it's a good idea.
I do not understand the meaning of the encoding field.
The goal is to provide information about which encodings support or have supported the language, even partially (definitely, one couldn't say OT1 supports any language except English and a few others). This field is essentially informative.
I understand hyphenchar (should be the same as in English in all mentioned languages) but do not understand the other hyphen* fields.
Most of them are intended for luatex (only for the languages they make sense, of course). Javier ------------------------------------------------
The minus sign in both Czech and Slovak should be – The quotes in both Czech and Slovak are „ and “ (the closing quote has its codepoint in Unicode but is rarely present in fonts, it is better to use English opening quote which has the same shape). In Czech (and maybe also in Slovak) the time separator is a period, in sport results and time tables a colon is used. Slovak: characters Ä Ď Ô Ť in index look strange to me, it should be proved by a native Slovak speaker. Hindi ==== See the note on the encoding above A few misprints and missing items in the captions bib = संदर्भ-ग्रन्थ (or संदर्भ-ग्रंथ) contents - the version you have is one of the alternatives suggested by Anshuman Pandey but most books I have bought in India contain अनुक्रम part = खण्ड (or खंड) page = पृष्ठ proof = प्रमाण glossary = शब्दार्थ सूची cc, encl, and headto make no sense, I am probably the only man who writes business e-mails in Hindi... I have never seen abreviated months (a native Hindi speaker should help). The only abbreviations for days of week I have seen at the Aligarh railway station are: Monday = सो॰, Tuesday = मं॰, Wednesday = बु॰, Thursday = बृह॰, Friday = शुक॰ (or शुक्र॰, the plate was not clearly readable), Saturday = शनि॰, Sunday = रवि॰. I would not be surprized if the ॰ punctuation were omitted. [characters] ङ and ञ are not used in Hindi, they should be removed from index frenchspacing – I am afraid that it has no sense in Hindi as well as other Indic languages. The proper spacing was implemented in GNU Freefont (at least for Hindi) and is activated automatically by language switching. The rules are explained (in Hindi only, links to other languages switch to a different text) at https://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%B9%E0%A4%BF%E0%A4%A8%E0%A5%8D%E0%A4%A6%E0%A5%80_%E0%A4%AE%E0%A5%87%E0%A4%82_%E0%A4%B8%E0%A4%BE%E0%A4%AE%E0%A4%BE%E0%A4%A8%E0%A5%8D%E0%A4%AF_%E0%A4%97%E0%A4%B2%E0%A4%A4%E0%A4%BF%E0%A4%AF%E0%A4%BE%E0%A4%81 punctuation: danda । and double danda ॥ should be listed as the most important punctuation quotes: either English double quotes or English single quotes are used (depends on the preference of an author and/or a publisher) number: Both Devanagari and Arabic digits are used, it is hard to say which one should be he default counters: the way how list items are numbered does not conform to the LaTeX system. I have a normative document how it should be done, it is written in Marathi and I probably have also a Hindi version. Unfortunately I have not found time to implement it so far. Zdeněk Wagner http://ttsm.icpf.cas.cz/team/wagner.shtml http://icebearsoft.euweb.cz 2016-03-23 19:31 GMT+01:00 Javier Bezos <lis...@tex-tipografia.com <mailto:lis...@tex-tipografia.com>>: Hi all, I'm working on a new version of babel, with a new way to define languages in a descriptive way, more than in a programmatic one (of course, the latter won't be excluded because it's still necessary). The idea is to create a set of ini file like those you can find on https://latex-project.org/svnroot/latex2e-public/trunk/required/babel/locales/ They are tentative and some of them are incomplete. I'm working on the code to read and 'transform' their data, but in the meanwhile I'd like to improve the ini files. The first step in the roadmap is to provide real utf-8 strings for captions and dates with current styles so that they can be useable even without fontenc. Any help or comments would be greatly appreciated. [Crossposted to xetex and luatex lists.] Javier -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex