Hi.
I'm working on a Java server side application which produces PDF forms which
are pre-filled by the application. These documents are delivered to the end
user via a browser interface after which the end user continues to edit the
forms. Usually the forms are then printed by the end user or just saved
electronically. No additional processing of the user input by the application
is needed, although this may be a future scenario.
The problem is with displaying non-ascii characters in editable fields. When
the data entered by the application in a form field contains non-ascii
characters, they do not show up correctly once the document is opened in a PDF
viewer. However, when the field is selected, the content is displayed
correctly. If the data is changed, it will continue to display correctly after
selecting another field, but if left unchanged, non-ascii characters return to
the messed up state when the user moves out of the field.
I'm using PDFBox 1.8.4, but I had the same problem with the previous version
(1.8.3). I have not tried earlier versions.
Can anyone tell me if non-ascii characters are supposed to work properly in an
AcroForm field? What requirements does this pose on the PDF template? Do I need
to encode the data before setting as the value of the PDField? If so, what
encoding method to use?
Below is a simplified code sample of what I'm doing, from end-to-end. I've
tried various alternatives in setting the encoding of the value of the field
and I've made attempts to control the font setting via the DA dictionary
parameter, but with no success. In most cases the read-only value turned out
invisible, while selecting the field would display the data correctly.
//MyPdfCreator:
String TEMPLATE_NAME = "Form_13349A.pdf";
InputStream is =
this.getClass().getClassLoader().getResourceAsStream(TEMPLATE_NAME);
pdfTemplate = PDDocument.load(is);
PDDocumentCatalog docCatalog = pdfTemplate.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDField field = acroForm.getField("Field1");
String valueWithNonAsciiChars = "ÄÅÖöäå";
field.setValue(valueWithNonAsciiChars);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
pdfTemplate.save(byteArrayOutputStream);
pdfTemplate.close();
byte[] pdf = byteArrayOutputStream.toByteArray();
//MyHttpRequestHandler:
ByteArrayOutputStream baos = new ByteArrayOutputStream(pdf.length);
baos.write(pdf, 0, pdf.length);
resourceResponse.setContentType("application/pdf");
resourceResponse.addProperty(HttpHeaders.CONTENT_DISPOSITION, "attachment;
filename=Form_13349A.pdf");
resourceResponse.setContentLength(baos.size());
OutputStream out = resourceResponse.getPortletOutputStream();
baos.writeTo(out);
out.flush();
out.close();
Every hint I've found on the Internet suggest that it's a font related problem.
But frankly, it seems like PdfBox is messing up the textField properties while
setting the value. I found a couple of descriptions matching my problem, but no
solution. PDFBOX-283 issue seems to be talking about the same problem, and
there is even a patch attached, but apparently the fix has other unwanted side
effects or why was it not added to the latest version? I have not tested the
patch yet, but I probably will shortly.
https://issues.apache.org/jira/browse/PDFBOX-283
As a temporary fix, I was able to produce a successful result by editing the
template PDF, by setting the Custom Format Script (that's what Adobe XI calls
it) of the field like so:
var txtField = event.target;
txtField.textFont = font.Helv;
txtField.textColor = color.black;
HOWEVER, this only works with Adobe Reader, not the built-in reader with Chrome
or Firefox. Plus, this is not a very nice fix since it requires the PDF
template designer to remember to copy the script into the Custom Format Script
entry for each and every field in each and every PDF template. Most importantly
though, the solution should support every major PDF viewer.
Help would be very much appreciated!
Pasi Koski