Hi! I need to get textboxes/textblocks from pdf files. I can convert them into ps. Is anyone knows about method, trick, routine to I can get the textboxes from ps or pdf? (Pythonic, COM, or command line solutions needed.)
I need to redraw them into my application, and user can reorder them, and next I concat. every text to process it. I need these infos: x, y, w, h, text Example: page1 textbox1{x:100,y:100;w:600;h:27;text:"TextBox1 /xfc /xfa"} textbox2{x:100,y:180;w:600;h:27;text:"TextBox2"} page2 textbox1{x:100,y:100;w:600;h:27;text:"TextBox1"} textbox2{x:100,y:180;w:600;h:27;text:"TextBox2"} ... Any solution? Thanks for it! dd ps1: I tried every pdf2text and pdf2html application. All failed in the test. Only one provide good informations, the pdftohtml, because it is makes divs with abs. position and size and the texts. But this program is not handle the iso-8859-2 chars, so I lost them. ps2: The program must run under Windows XP. So the solution is os specific. -- http://mail.python.org/mailman/listinfo/python-list