On 2015-09-13 20:06, Rob Hawkins wrote:
Greetings all,
Can pdftohtml produce output for Burmese, Khmer, Indonesian, Thai and
Vietnamese? I didn't see a language pack for any except Thai, and
that one doesn't produce properly formatted characters for my source
files. They're missing the vowel
Dear Rob,
Poppler extracts the text from PDF via the serie of glyphs.
Therefore, the scripts that the Unicode encode the characters
as visible order, the first step of the text extraction is
possible.
However, some Asian scripts, especially Brahmic-based scripts,
have very complicated layout
Thank you all for these great replies. I find the stuff about the unicode
encoding order really interesting. And I too wish we could find more
information about the as-yet unmapped Asian scripts.
I was mistaken about the output of PDF.js. I thought I had viewed the HTML
source and seen good
On 15/09/15 01:23, Jonathan Kew wrote:
> On 14/9/15 16:40, Rob Hawkins wrote:
>> Thank you all for these great replies. I find the stuff about the
>> unicode encoding order really interesting. And I too wish we could find
>> more information about the as-yet unmapped Asian scripts.
>>
>> I was
On 14/9/15 16:40, Rob Hawkins wrote:
Thank you all for these great replies. I find the stuff about the
unicode encoding order really interesting. And I too wish we could find
more information about the as-yet unmapped Asian scripts.
I was mistaken about the output of PDF.js. I thought I had
Greetings all,
Can pdftohtml produce output for Burmese, Khmer, Indonesian, Thai and
Vietnamese? I didn't see a language pack for any except Thai, and that one
doesn't produce properly formatted characters for my source files. They're
missing the vowel marks. The other languages fail