Hi Julien, composing words reliably from individual characters may not be 100% sure method. As you have the advantage of being able to match a pattern you are looking for this will certainly help. Will it always certainly be a 100% accurate - maybe not. What you could do is try the ExtractText command line tool http://pdfbox.apache.org/commandline/#extractText or PDFTextStripper to extract text from your PDF and see what the results are and if the words you are looking for are treated as such.
BR Maruan Sahyoun Am 07.03.2014 um 12:16 schrieb Confidential Confidential <[email protected]>: > Sirs, > > I had already thought about this graphical approach to reconstruct the > words. I've let it down because I'm a bit sceptical on the reliability of > such a method. I can't help thinking that it will not be a 100% sure > method. I understand why a CAD software would produce such an output, > though (thank you for this new word that I didn't know "boustrophedonic", > but it explains well the result obtained). > > Supposing that the characters appear in a totally arbitrary order, > detecting that they're on the same line is more or less piece of cake > (except if I need to introduce a tolerance, which makes things more > difficult), but grouping the characters according to their X position is > not at all an easy task. > > But this is not an issue, my problem is more the fact that this method may > not be 100% reliable. What do you think ? > > As for the technical part (overloading the processText), it's ok, thanks > for the advice. > > Best regards > > Julien > > > > 2014-03-06 18:39 GMT+01:00 HQS <[email protected]>: > >> Hello all, >> >> 1. >> Have you ever seen PDFs having this kind of (pseudo) structure : >> >> BT >> <character> >> Tj >> ET >> >> ? >> >> Which means, the strings are split into characters and there is one block >> of text per character ? >> It seems to be ill-formed doesn't it ? >> >> 2. Reminder of my first mail, what is the library compliancy regarding PDF >> standards ? 1.3 to 1.7 ? >> >> >> Thanks and regards >> >> Julien >> >>

