Sirs, I had already thought about this graphical approach to reconstruct the words. I've let it down because I'm a bit sceptical on the reliability of such a method. I can't help thinking that it will not be a 100% sure method. I understand why a CAD software would produce such an output, though (thank you for this new word that I didn't know "boustrophedonic", but it explains well the result obtained).
Supposing that the characters appear in a totally arbitrary order, detecting that they're on the same line is more or less piece of cake (except if I need to introduce a tolerance, which makes things more difficult), but grouping the characters according to their X position is not at all an easy task. But this is not an issue, my problem is more the fact that this method may not be 100% reliable. What do you think ? As for the technical part (overloading the processText), it's ok, thanks for the advice. Best regards Julien 2014-03-06 18:39 GMT+01:00 HQS <[email protected]>: > Hello all, > > 1. > Have you ever seen PDFs having this kind of (pseudo) structure : > > BT > <character> > Tj > ET > > ? > > Which means, the strings are split into characters and there is one block > of text per character ? > It seems to be ill-formed doesn't it ? > > 2. Reminder of my first mail, what is the library compliancy regarding PDF > standards ? 1.3 to 1.7 ? > > > Thanks and regards > > Julien > >

