You can download it from here: https://drive.google.com/file/d/0B5Kxacm1mej-MEZubTNYVVJYTFE/view?usp=sharing
Best Regards, On Tue, Mar 24, 2015 at 9:48 AM, Maruan Sahyoun <[email protected]> wrote: > > > > Am 24.03.2015 um 09:40 schrieb a7med shre3y <[email protected]>: > > > > Hi, > > > > In fact PDFBox call the operation of transforming "7R %H $SSURYHG" to "To > > Be Approved" as "encoding". Anyway, either it's encoding or decoding, I > > thought it's easier to transform "7R %H $SSURYHG" to "To Be Approved" and > > not the opposite (or at least I don't know). I spent some quite long time > > trying to find out how to find the character codes for the glyphs in the > > currently used font, then I found that it's not an easy task. By the way, > > if you know how to do that, I'd so much appreciate it because I need that > > for replacing text with another text and for that the new text must be > > encoded the same way as the original! > > > > Back to the text removal, I am able to find the text and also remove it > by > > calling reset, as I mentioned in my first email, when I print the output > > content I don't find the text anymore but I still see it when I open the > > file. My first assumption was that there must be some other way to remove > > the text other than the way I am using, and that's what you've actually > > confirmed in your reply, so could you please tell me what still missing? > > > > Could you upload the PDF with the reset text too? > > BR > Maruan > > > > Thanks and regards, > > a7mad > > > > On Tue, Mar 24, 2015 at 9:22 AM, Maruan Sahyoun <[email protected]> > > wrote: > > > >> Hi, > >> > >>> Am 24.03.2015 um 08:14 schrieb a7med shre3y <[email protected]>: > >>> > >>> Hi, > >>> > >>> Here's how I do it: > >>> > >>> 1. I use the following method to encode the text: > >>> > >>> String encode(String text, PDFont font) throws Exception { > >>> StringBuilder builder = new StringBuilder(); > >>> byte[] stringBytes = text.getBytes(); > >>> int codeLength = 1; > >>> for(int i = 0; i < stringBytes.length; i += codeLength){ > >>> String c = font.encode(stringBytes, i, codeLength); > >>> if(c == null && (i + 1 < stringBytes.length)){ > >>> codeLength++; > >>> c = font.encode(stringBytes, i, codeLength); > >>> } > >>> builder.append(c); > >>> } > >>> return builder.toString(); > >>> } > >>> > >>> 2. Iterating through the tokens, I find the text either it's a > COSString > >>> ("Tj" operator) or a COSArray ("TJ" operator) then check if it's the > text > >>> I'm looking for to remove as following: > >>> > >>> if (op.getOperation().equals("Tj")) { > >>> COSString previous = (COSString) tokens.get(j > >> - > >>> 1); > >>> String string = previous.getString(); > >>> String encodedString = encode(string, font); > >> > >> that string is already encoded. So you'd need to encode "To Be Approved" > >> and compare if that matches the string you are reading from the PDF. > >> > >>> if(encodedString.contains("To Be Approved")){ > >>> previous.reset(); > >>> } > >>> } else if (op.getOperation().equals("TJ")) { > >>> COSArray previous = (COSArray) tokens.get(j - > >>> 1); > >>> StringBuilder stringBuilder = new > >>> StringBuilder(); > >>> for (int k = 0; k < previous.size(); k++) { > >>> Object arrElement = > previous.getObject(k); > >>> if (arrElement instanceof COSString) { > >>> COSString cosString = (COSString) > >>> arrElement; > >>> > >>> stringBuilder.append(cosString.getString()); > >>> } > >>> } > >>> String string = stringBuilder.toString(); > >>> String encodedString = encode(string, font); > >>> if(encodedString.contains("To Be Approved")){ > >>> previous.clear(); > >>> } > >>> } > >>> > >>> Note: > >>> In case of COSArray, I first iterate through the whole array to get the > >>> whole string before encoding and comparison and this works. > >>> > >>> Best Regards, > >>> a7mad > >>> > >>> > >>> > >>> On Mon, Mar 23, 2015 at 10:48 PM, Maruan Sahyoun < > [email protected] > >>> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> your text is encoded so within the show text operator Tj the string is > >>>> > >>>> 7R %H $SSURYHG > >>>> > >>>> You wrote that you encode your string to find it - what do you get? > >>>> > >>>> BR > >>>> Maruan > >>>> > >>>> > >>>> > >>>>> Am 23.03.2015 um 22:01 schrieb a7med shre3y <[email protected] > >: > >>>>> > >>>>> Hi Maruan, > >>>>> > >>>>> Here's a link from where you can download the PDF. > >>>>> > >>>>> > >>>> > >> > https://drive.google.com/file/d/0B5Kxacm1mej-bm82NzNvUXFPSmMtUjc0ZFVjVVlrODZnRzdn/view?usp=sharing > >>>>> > >>>>> Kind Regards, > >>>>> a7mad > >>>>> > >>>>> On Mon, Mar 23, 2015 at 8:57 PM, Maruan Sahyoun < > >> [email protected]> > >>>>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> you need to upload it to a public location as the mailing list > doesn't > >>>>>> support attachments. > >>>>>> > >>>>>> BR > >>>>>> Maruan > >>>>>> > >>>>>>> Am 23.03.2015 um 19:18 schrieb a7med shre3y < > [email protected] > >>> : > >>>>>>> > >>>>>>> Dear Maruan, > >>>>>>> > >>>>>>> Thank you very much for the information. Please find herewith > >> attached > >>>>>> the PDF to reproduce the problem. > >>>>>>> The text to remove is: "To Be Approved". The text has a multi-byte > >>>>>> encoding, so I call first to encode it in order to find it then > remove > >>>> it. > >>>>>>> > >>>>>>> Best Regards, > >>>>>>> a7mad > >>>>>>> > >>>>>>>> On Mon, Mar 23, 2015 at 4:13 PM, Maruan Sahyoun < > >>>> [email protected]> > >>>>>> wrote: > >>>>>>>> Dear a7mad, > >>>>>>>> > >>>>>>>> removing text from a PDF is not an easy task as > >>>>>>>> - text which might visually appear as a single item might > consistent > >>>> of > >>>>>> individual parts within the PDF itself e.g. each character or groups > >> of > >>>>>> characters are place individually in different COSStrings > >>>>>>>> - text might be drawn using graphics commands > >>>>>>>> - text can appear within different parts of the PDF (e.g. the text > >>>>>> might be content of a form field AND the annotation representing the > >>>> form > >>>>>> field visually) > >>>>>>>> - you need to look up the encoding information to get form the > >>>>>> characters in the PDF "string" to the ones you are looking for > >>>>>>>> …. > >>>>>>>> > >>>>>>>> If you can post a specific PDF to a public location and describe > in > >>>>>> detail which string should have been replaced which hasn't I will be > >>>> able > >>>>>> to tell you why that might have happened. > >>>>>>>> > >>>>>>>> Maruan > >>>>>>>> > >>>>>>>> > >>>>>>>>> Am 23.03.2015 um 15:03 schrieb a7med shre3y < > >> [email protected] > >>>>> : > >>>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> Currently I am facing a strange problem removing text from the > some > >>>>>> PDFs. > >>>>>>>>> My program is able to find the text and "remove it" by calling > the > >>>>>>>>> COSString.reset() method. > >>>>>>>>> The problem is, when I open the output PDF file, I still see the > >> text > >>>>>> but > >>>>>>>>> not selectable (I mean when I try to highlight it with the mouse > to > >>>>>> copy > >>>>>>>>> it, it's not selectable!). When print the content (tokens) of the > >>>>>> output > >>>>>>>>> file, I DO NOT find the text at all!! > >>>>>>>>> > >>>>>>>>> I am currently stuck in the PDF specifications 1.5 and really > >> running > >>>>>> out > >>>>>>>>> of time. > >>>>>>>>> > >>>>>>>>> I'd so much appreciate any help or any idea on what's going on. > >>>>>>>>> > >>>>>>>>> Notes: > >>>>>>>>> 1. I use use PDFBox 1.7.1 > >>>>>>>>> 2. This problem does not occur with all PDFs, only some PDFs > cause > >>>>>> this > >>>>>>>>> problem. > >>>>>>>>> > >>>>>>>>> Thank you very much. > >>>>>>>>> a7mad > >>>>>>>> > >>>>>>>> > >>>>>>>> > >> --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail: [email protected] > >>>>>>>> For additional commands, e-mail: [email protected] > >>>>>>> > >>>>>>> > >>>>>>> > --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: [email protected] > >>>>>>> For additional commands, e-mail: [email protected] > >>>>>> > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: [email protected] > >>>> For additional commands, e-mail: [email protected] > >>>> > >>>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >

