The last time I had to extract right-to-left text from PDF the main issue was that the text is in the data stream in the order it's placed on the page, not the reading order, meaning that the characters for a right-to-left word would be "tac" not "cat" as they would be in XML, for example.
If Arabic numbers are rendered right-to-left then what you're seeing in the PDF reflects that. That is, the data stream reflects the order the characters are placed on the page, not necessarily their source order (the order they would occur in XML or in a wordprocessing document). So you may have no choice but to assume all numbers are right-to-left or try to find other clues to indicate the reading order, because of course there could be reading order changes within text that for example renders English words left-to-right within right-to-left text. The work I did was converting Arabic ledgers to HTML so I didn't have to try to correctly reflect the reading order because I was just creating a visual representation, but I know it came as a bit of a surprise that the order of characters in the PDF reflected the order as presented, not the reading order, at least in the samples I had. I guess it would be possible to construct PDFs where the characters can occur in the PDF data in reading order and the drawing commands produce the correct order as presented. Cheers, E. On 5/31/13 10:36 AM, "soleymani mohsen" <[email protected]> wrote: > hello > I'am usnig your API, it's very well but i have a question ? > i use pdfbox( and use icu4j-51 and also call setSortByPosition(true) > method ) for text extraction from right to left languages ( hebrew / > persian / arabic ) pdf > > all things are ok but numbers get right to left for example : 1984 is > parsed 4891 or > 12345 go into 54321 > > please help me what should i do? > thank you. -- Eliot Kimber Senior Solutions Architect, RSI Content Solutions "Bringing Strategy, Content, and Technology Together" Main: 512.554.9368 www.rsicms.com www.rsuitecms.com Book: DITA For Practitioners, from XML Press, http://xmlpress.net/publications/dita/practitioners-1/

