By good fortune I got a form in that shows the problem. https://dl.dropboxusercontent.com/u/25802656/Tracleer%20Patient%20Enrollment%20and%20Consent%20Form%20Revised.pdf
There is a field that Acrobat quite happily calls 'Tracleer 62.5' and treats as an entirely normal text field. But of course PDFBox is confused by this. That is the kind of thing that I am talking about. And it is very easy to manually fix it in Acrobat of course, but I am trying to build automation tools and there are usually very important fields (the ones with the dots) that provide a great deal of informational content to my tools so they can reason about the form. Thank you for looking at this. On Sat, Sep 24, 2016 at 3:07 PM, Evan Williams <[email protected]> wrote: > Hi Maruan, > > The answer to your question is yes, but my problem is that I tend to fix > the PDFs every time I find this issue so I am not certain that I have any > sitting around that show the problem. But it is easy enough to create. I > will just edit a PDF with Acrobat and put a dot in a field name. I will do > that later this afternoon. > > Thank you. > > On Sat, Sep 24, 2016 at 2:21 PM, Maruan Sahyoun <[email protected]> > wrote: > >> Hi, >> >> > Am 24.09.2016 um 17:13 schrieb Evan Williams <[email protected] >> >: >> > >> > I have a problem, but I think it's non-terminal. >> > >> > I have been using PDFBox to work with forms for about a year and a half, >> > and I have a handle on many things, but I have a persistent and >> pernicious >> > issue with forms where fields have periods ('.') in their name. >> >> would it be possible to upload a sample to a public location to take a >> look. >> >> BR >> >> Maruan >> >> > >> > These forms are from external sources and are typically old school >> > AcroForms. Because of the nature of the forms (medical), they often >> contain >> > decimal values like '0.5 mg' or 'W55.21'. These forms do not seem to >> have >> > ever been meant to be read programatically. They are for human >> consumption. >> > >> > As far as I can tell, '.' is a magic character used by fully qualified >> > names that delineates elements of the path. So when I iterate over the >> > fields I get a bunch of name fragments as 'PDNonTerminalField's and >> regular >> > fields. >> > >> > My current way of dealing with this is to waste the time of a skilled >> > graphic designer, or my own time, manually going in and fixing it. This >> is >> > mostly just an annoyance. But annoyances add up. And I am trying to >> > automate as much as I possibly can in dealing with these forms. >> > >> > *Is there any obvious way to identify this corrupt situation and >> correct it* >> > >> > I wonder if I Am just doing something wrong (I am iterating over the >> > fields in the time honored way that the form example that is included >> with >> > PDFBox uses). >> > >> > Adobe Acrobat seems perfectly happy to deal with fields containing >> periods >> > (including, unfortunately, allowing people to create them). So there >> must >> > be some way to deal with this. >> > >> > Your advice would be of great service to me. >> > >> > Thank you. >> > -- >> > *Evan Williams* >> > Sr. Software Engineer >> > [email protected] >> > >> > *www.ZappRx.com <http://www.zapprx.com/>* >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > > -- > *Evan Williams* > Sr. Software Engineer > [email protected] > > *www.ZappRx.com <http://www.zapprx.com/>* > > -- *Evan Williams* Sr. Software Engineer [email protected] *www.ZappRx.com <http://www.zapprx.com/>*

