Re: [XeTeX] Devanagari ASCII to Unicode mapping

2018-02-17 Thread Mike Maxwell

On 2/17/2018 11:58 AM, ShreeDevi Kumar wrote:
Before unicode, devanagari fonts used the ASCII range (legacy fonts) - 
however AFAIK there is no standardization in the mapping, though various 
families of fonts had similar mapping.


see http://hindi-fonts.com/tools for converters from different mappings 
to unicode.


So,  ASCII to Unicode mapping for Devanagari will change based on the 
font used.


Indeed!  In 2003, DARPA held a "surprise language exercise", the goal of 
which was to produce (very basic) MT etc. tools for Hindi, in a month's 
time.  I had been involved in the prep for it to ensure that there would 
be no roadblocks (at the time, I was working at the LDC).  One of the 
things that Bill Poser and I verified was that there was a Unicode 
encoding for Hindi/Devanagari.  There was, but that was the wrong 
question.


The right question was whether any Hindi website used Unicode.  The 
answer to that was that the BBC and Colgate did, but hardly anyone else. 
 A few Indian government sites used ISCII, which wouldn't have been 
bad, but most places used proprietary encodings that went along with a 
proprietary font.  Worse, these were not simple code-point-to-character 
encodings; it was as if the Latin letter 'l' had been encoded as 'l', 
but then 'd' had been encoded as 'c' + 'l', 'b' as 'l' + a sort of 
backwards 'c', 'p' as a lowered 'l' _ the backwards 'c', etc.  It was a 
mess, and for awhile it was unclear whether the exercise would fail 
because most of the data we needed was in these weird proprietary 
encodings.  (It eventually succeeded.)


There are some notes here--

http://languagelog.ldc.upenn.edu/myl/ldc/hindi_fonts_and_conversions.html
--that Mark Liberman of the LDC made at the time concerning some of the 
issues.  Most of it is long out of date (and the links are probably 
broken), and these proprietary encodings have thankfully been replaced 
by Unicode; but if you're dealing with documents from that era, you 
might still run into them.  The LDC *might* still have the encoding 
converters laying around somewhere.

--
   Mike Maxwell
   "My definition of an interesting universe is
   one that has the capacity to study itself."
 --Stephen Eastmond


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Devanagari ASCII to Unicode mapping

2018-02-17 Thread ShreeDevi Kumar
Please see

view-source:http://hindi-fonts.com/tools/Preeti-to-Unicode-Converter

There is no direct mapping, but  array_one has the ASCII codes for Preeti,
while array_two has the corresponding unicode.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Feb 17, 2018 at 10:32 PM, ShreeDevi Kumar 
wrote:

> > What I think I am looking for is something that would map a document
> typeset using something like the Devanagari Preeti font
> (https://fonts2u.com/preeti.font), which seems to have the Devanagari
> glyphs encoded in the range 0x00-0x7F, to something like the
> Devanagari unicode font Mukta
> (https://ektype.in/scripts/devanagari/mukta.html) in the range
> 0x0900-0x097F.
>
> Please try http://www.ashesh.com.np/preeti-unicode/
>
> Also see
>
> https://github.com/Shuvayatra/preeti
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Sat, Feb 17, 2018 at 10:27 PM, Mike Maxwell 
> wrote:
>
>> On 2/17/2018 11:08 AM, Daniel Greenhoe wrote:
>>
>>> Does anyone know where I can find an ASCII to Unicode mapping for
>>> Devanagari?
>>>
>>> For example, it seems that the Devanagari  glyph "ब" is encoded as
>>> 0x61 (hex) in ASCII (lower case 'a' for the Latin alphabet), but is
>>> 0x092C in the Unicode standard:
>>>http://www.unicode.org/charts/PDF/U0900.pdf
>>>
>>> So what I am asking for is a map (or table) that maps 0x00-0x7F in
>>> Devanagari ASCII to 0x0900-0x097F in Unicode.
>>>
>>
>> In addition to the ASCII-to-Devanagari transcription system that Philip
>> Taylor mentioned, you may be interested in the ISCII encoding for
>> Brahmi-derived writing systems, including Devanagari:
>>
>> https://en.wikipedia.org/wiki/Indian_Script_Code_for_Informa
>> tion_Interchange
>>
>> This is _not_ an ASCII-to-Devanagari encoding, rather it leaves the ASCII
>> range intact, and encodes Devanagari (etc.) in the range 128 (actually,
>> 161)-255.  It was afaik never widely used, but there were (and probably
>> still are) fonts for it.  I don't imagine those fonts would be terribly
>> high quality by today's standards, e.g. I'd be surprised if they handled
>> conjunct characters.
>>
>> FWIW, there was a similar encoding called TSCII for Tamil.
>>
>> iconv can be used to map TSCII to other encodings, but for some reason it
>> doesn't seem to have ISCII in its reportoire (it does include VISCII, but
>> that's a legacy Vietnamese encoding).
>> --
>>Mike Maxwell
>>"My definition of an interesting universe is
>>one that has the capacity to study itself."
>>  --Stephen Eastmond
>>
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>>
>
>


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Devanagari ASCII to Unicode mapping

2018-02-17 Thread ShreeDevi Kumar
> What I think I am looking for is something that would map a document
typeset using something like the Devanagari Preeti font
(https://fonts2u.com/preeti.font), which seems to have the Devanagari
glyphs encoded in the range 0x00-0x7F, to something like the
Devanagari unicode font Mukta
(https://ektype.in/scripts/devanagari/mukta.html) in the range
0x0900-0x097F.

Please try http://www.ashesh.com.np/preeti-unicode/

Also see

https://github.com/Shuvayatra/preeti

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Feb 17, 2018 at 10:27 PM, Mike Maxwell 
wrote:

> On 2/17/2018 11:08 AM, Daniel Greenhoe wrote:
>
>> Does anyone know where I can find an ASCII to Unicode mapping for
>> Devanagari?
>>
>> For example, it seems that the Devanagari  glyph "ब" is encoded as
>> 0x61 (hex) in ASCII (lower case 'a' for the Latin alphabet), but is
>> 0x092C in the Unicode standard:
>>http://www.unicode.org/charts/PDF/U0900.pdf
>>
>> So what I am asking for is a map (or table) that maps 0x00-0x7F in
>> Devanagari ASCII to 0x0900-0x097F in Unicode.
>>
>
> In addition to the ASCII-to-Devanagari transcription system that Philip
> Taylor mentioned, you may be interested in the ISCII encoding for
> Brahmi-derived writing systems, including Devanagari:
>
> https://en.wikipedia.org/wiki/Indian_Script_Code_for_Informa
> tion_Interchange
>
> This is _not_ an ASCII-to-Devanagari encoding, rather it leaves the ASCII
> range intact, and encodes Devanagari (etc.) in the range 128 (actually,
> 161)-255.  It was afaik never widely used, but there were (and probably
> still are) fonts for it.  I don't imagine those fonts would be terribly
> high quality by today's standards, e.g. I'd be surprised if they handled
> conjunct characters.
>
> FWIW, there was a similar encoding called TSCII for Tamil.
>
> iconv can be used to map TSCII to other encodings, but for some reason it
> doesn't seem to have ISCII in its reportoire (it does include VISCII, but
> that's a legacy Vietnamese encoding).
> --
>Mike Maxwell
>"My definition of an interesting universe is
>one that has the capacity to study itself."
>  --Stephen Eastmond
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Devanagari ASCII to Unicode mapping

2018-02-17 Thread ShreeDevi Kumar
> For example, it seems that the Devanagari  glyph "ब" is encoded as
0x61 (hex) in ASCII (lower case 'a' for the Latin alphabet),

Before unicode, devanagari fonts used the ASCII range (legacy fonts) -
however AFAIK there is no standardization in the mapping, though various
families of fonts had similar mapping.

see http://hindi-fonts.com/tools for converters from different mappings to
unicode.

So,  ASCII to Unicode mapping for Devanagari will change based on the font
used.


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Feb 17, 2018 at 10:04 PM, Philip Taylor  wrote:

> Daniel Greenhoe wrote:
>
>> Does anyone know where I can find an ASCII to Unicode mapping for
>> Devanagari?
>>
> Would this be of any help ?
>
> https://clas.uiowa.edu/linguistics/hindi-verb-project/ascii-
> devanagari-chart
>
> Philip Taylor
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Devanagari ASCII to Unicode mapping

2018-02-17 Thread Mike Maxwell

On 2/17/2018 11:08 AM, Daniel Greenhoe wrote:

Does anyone know where I can find an ASCII to Unicode mapping for Devanagari?

For example, it seems that the Devanagari  glyph "ब" is encoded as
0x61 (hex) in ASCII (lower case 'a' for the Latin alphabet), but is
0x092C in the Unicode standard:
   http://www.unicode.org/charts/PDF/U0900.pdf

So what I am asking for is a map (or table) that maps 0x00-0x7F in
Devanagari ASCII to 0x0900-0x097F in Unicode.


In addition to the ASCII-to-Devanagari transcription system that Philip 
Taylor mentioned, you may be interested in the ISCII encoding for 
Brahmi-derived writing systems, including Devanagari:


https://en.wikipedia.org/wiki/Indian_Script_Code_for_Information_Interchange

This is _not_ an ASCII-to-Devanagari encoding, rather it leaves the 
ASCII range intact, and encodes Devanagari (etc.) in the range 128 
(actually, 161)-255.  It was afaik never widely used, but there were 
(and probably still are) fonts for it.  I don't imagine those fonts 
would be terribly high quality by today's standards, e.g. I'd be 
surprised if they handled conjunct characters.


FWIW, there was a similar encoding called TSCII for Tamil.

iconv can be used to map TSCII to other encodings, but for some reason 
it doesn't seem to have ISCII in its reportoire (it does include VISCII, 
but that's a legacy Vietnamese encoding).

--
   Mike Maxwell
   "My definition of an interesting universe is
   one that has the capacity to study itself."
 --Stephen Eastmond


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Devanagari ASCII to Unicode mapping

2018-02-17 Thread Daniel Greenhoe
> https://clas.uiowa.edu/linguistics/hindi-verb-project/ascii-devanagari-chart

That one looks to be more like an input tool (like a teckit mapping)
for Devanagari.

What I think I am looking for is something that would map a document
typeset using something like the Devanagari Preeti font
(https://fonts2u.com/preeti.font), which seems to have the Devanagari
glyphs encoded in the range 0x00-0x7F, to something like the
Devanagari unicode font Mukta
(https://ektype.in/scripts/devanagari/mukta.html) in the range
0x0900-0x097F.

In short, I would maybe like a simple map something like this:
  0x21 --> 0x096F  (९)
  0x22 --> 0x0942
  0x23 --> 0x0969 (३)
  0x24 --> 0x096A (४)
  0x25 --> 0x096B (५)
  0x26 --> 0x096D (७)
  ...



On Sat, Feb 17, 2018 at 4:34 PM, Philip Taylor  wrote:
> Daniel Greenhoe wrote:
>>
>> Does anyone know where I can find an ASCII to Unicode mapping for
>> Devanagari?
>
> Would this be of any help ?
>
> https://clas.uiowa.edu/linguistics/hindi-verb-project/ascii-devanagari-chart
>
> Philip Taylor


https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon";
target="_blank">https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif";
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/>
Virus-free. https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link";
target="_blank" style="color: #4453ea;">www.avast.com






--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Devanagari ASCII to Unicode mapping

2018-02-17 Thread Philip Taylor

Daniel Greenhoe wrote:

Does anyone know where I can find an ASCII to Unicode mapping for Devanagari?

Would this be of any help ?

https://clas.uiowa.edu/linguistics/hindi-verb-project/ascii-devanagari-chart

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


[XeTeX] Devanagari ASCII to Unicode mapping

2018-02-17 Thread Daniel Greenhoe
Does anyone know where I can find an ASCII to Unicode mapping for Devanagari?

For example, it seems that the Devanagari  glyph "ब" is encoded as
0x61 (hex) in ASCII (lower case 'a' for the Latin alphabet), but is
0x092C in the Unicode standard:
  http://www.unicode.org/charts/PDF/U0900.pdf

So what I am asking for is a map (or table) that maps 0x00-0x7F in
Devanagari ASCII to 0x0900-0x097F in Unicode.

Does anyone know where I might find such a mapping?

Many many thanks in advance,
Dan









https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon";
target="_blank">https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif";
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/>
Virus-free. https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link";
target="_blank" style="color: #4453ea;">www.avast.com






--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] A problem with a Devanagari font

2018-02-17 Thread Daniel Greenhoe
For whatever it's worth (maybe not too much), I did try the test cases
using a little bit different way: I used the packages fontspec and
xunicode, but *not* the package *polyglossia*.

The result was that both the "main font" and "specified font" appear
(to me) to be the same and also appear (to me) to be the same as "how
it should look like" in the original posting.

I have attached an example tex and pdf file. Take a look if you think
there is any chance it might be helpful.

Dan




https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon";
target="_blank">https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif";
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/>
Virus-free. https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link";
target="_blank" style="color: #4453ea;">www.avast.com




On Sat, Feb 17, 2018 at 5:25 AM, RD Holkar  wrote:
> Hi,
>
> @Zdenek Wagner- After reading your email, I tried this on different machines
> and systems, except Windows. Strangely, I get the same result. Two of my
> colleagues faced the same issue--- one with documentclass article and the
> other with beamer (I am using memoir).
> I have written the font developer, and will check our systems as well.
>
> Thank you!
>
> With best regards,
> -Rohit.
>
>
> On Tue, Feb 13, 2018 at 3:18 PM, Zdenek Wagner 
> wrote:
>>
>> Strange, on my computer it works. I do not have Shobhika, so I get errors
>> on missing fonts and the characters to be printed in shobhika disappear but
>> the text in Jaini works, I get twice the right conjuncts.
>>
>> Zdeněk Wagner
>> http://ttsm.icpf.cas.cz/team/wagner.shtml
>> http://icebearsoft.euweb.cz
>>
>> 2018-02-13 9:07 GMT+01:00 RD Holkar :
>>>
>>> Dear all,
>>>
>>> here is another issue I am facing: I am using a Devanagari font called
>>> Jaini; downloadable from - https://ektype.in/jaini-1106.html
>>>
>>> When I set this font as the main font of my document, the letter
>>> conjugating श and ल look different that how they should look different that
>>> how they should look like.
>>> Whereas, when I define Jaini one of the fonts and use it, the letters
>>> look fine.
>>> See the attached example.
>>>
>>> Why is this happening? (Is a fault in the font?)
>>>
>>> Thank you in advance.
>>>
>>> With best regards,
>>> -Rohit.
>>>
>>>
>>>
>>> --
>>> Subscriptions, Archive, and List information, etc.:
>>>   http://tug.org/mailman/listinfo/xetex
>>>
>>
>>
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>   http://tug.org/mailman/listinfo/xetex
>>
>
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
>


Jaini2.pdf
Description: Adobe PDF document


Jaini2.tex
Description: TeX document


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex