> Am 24.03.2015 um 10:43 schrieb a7med shre3y <[email protected]>:
> 
> I mean how to find them in the PDF while rotating over the tokens, what is
> the operator?
> 
> On Tue, Mar 24, 2015 at 10:40 AM, Maruan Sahyoun <[email protected]>
> wrote:
> 
>> 
>>> Am 24.03.2015 um 10:36 schrieb a7med shre3y <[email protected]>:
>>> 
>>> What are the drawing commands? I'd then investigate one how to specify
>> the
>>> text ones.
>>> 
>> 
>> 738.7469 167.1278 m

MoveTo

>> 733.8743 167.1278 l
>> 

LineTo


>> 
>> 
>>> On Tue, Mar 24, 2015 at 10:26 AM, Maruan Sahyoun <[email protected]
>>> 
>>> wrote:
>>> 
>>>> 
>>>>> Am 24.03.2015 um 10:14 schrieb a7med shre3y <[email protected]>:
>>>>> 
>>>>> That's true, I've even tried to change the rendering text mode to other
>>>>> values already as mentioned in the PDF specs 1.5 table 5.3 before
>>>> removing
>>>>> it also didn't work.
>>>>> So how to remove the graphics content then?
>>>> 
>>>> the simple answer - remove the drawing commands.
>>>> 
>>>> The longer answer as you obviously don't want to remove all drawing
>>>> commands you'd need to find which are the ones drawing the text. As you
>>>> would like to remove certain vectors which are matching a certain
>>>> character/glyph you first need to find out which are the ones drawing
>> e.g.
>>>> the letter 'T'. I don't think that this is doable in a reasonable
>> amount of
>>>> time for arbitary text.
>>>> 
>>>> Maruan
>>>> 
>>>> 
>>>>> 
>>>>> Best Regards,
>>>>> 
>>>>> On Tue, Mar 24, 2015 at 10:06 AM, Maruan Sahyoun <
>> [email protected]
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>>> Am 24.03.2015 um 09:55 schrieb a7med shre3y <[email protected]
>>> :
>>>>>>> 
>>>>>>> You can download it from here:
>>>>>>> 
>>>>>> 
>>>> 
>> https://drive.google.com/file/d/0B5Kxacm1mej-MEZubTNYVVJYTFE/view?usp=sharing
>>>>>>> 
>>>>>> 
>>>>>> looking more closely you correctly replaced the text, but that text
>> was
>>>> in
>>>>>> there for searching within the PDF as it used text rendering mode 3
>>>>>> (invisible). The 'text' you are still seeing is drawn using vector
>>>> commands
>>>>>> so it's graphics content.
>>>>>> 
>>>>>> BR
>>>>>> Maruan
>>>>>> 
>>>>>> 
>>>>>>> Best Regards,
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Mar 24, 2015 at 9:48 AM, Maruan Sahyoun <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Am 24.03.2015 um 09:40 schrieb a7med shre3y <
>> [email protected]
>>>>> :
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> In fact PDFBox call the operation of transforming "7R %H $SSURYHG"
>> to
>>>>>> "To
>>>>>>>>> Be Approved" as "encoding". Anyway, either it's encoding or
>>>> decoding, I
>>>>>>>>> thought it's easier to transform "7R %H $SSURYHG" to "To Be
>> Approved"
>>>>>> and
>>>>>>>>> not the opposite (or at least I don't know). I spent some quite
>> long
>>>>>> time
>>>>>>>>> trying to find out how to find the character codes for the glyphs
>> in
>>>>>> the
>>>>>>>>> currently used font, then I found that it's not an easy task. By
>> the
>>>>>> way,
>>>>>>>>> if you know how to do that, I'd so much appreciate it because I
>> need
>>>>>> that
>>>>>>>>> for replacing text with another text and for that the new text must
>>>> be
>>>>>>>>> encoded the same way as the original!
>>>>>>>>> 
>>>>>>>>> Back to the text removal, I am able to find the text and also
>> remove
>>>> it
>>>>>>>> by
>>>>>>>>> calling reset, as I mentioned in my first email, when I print the
>>>>>> output
>>>>>>>>> content I don't find the text anymore but I still see it when I
>> open
>>>>>> the
>>>>>>>>> file. My first assumption was that there must be some other way to
>>>>>> remove
>>>>>>>>> the text other than the way I am using, and that's what you've
>>>> actually
>>>>>>>>> confirmed in your reply, so could you please tell me what still
>>>>>> missing?
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Could you upload the PDF with the reset text too?
>>>>>>>> 
>>>>>>>> BR
>>>>>>>> Maruan
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Thanks and regards,
>>>>>>>>> a7mad
>>>>>>>>> 
>>>>>>>>> On Tue, Mar 24, 2015 at 9:22 AM, Maruan Sahyoun <
>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>>> Am 24.03.2015 um 08:14 schrieb a7med shre3y <
>>>> [email protected]
>>>>>>> :
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> Here's how I do it:
>>>>>>>>>>> 
>>>>>>>>>>> 1. I use the following method to encode the text:
>>>>>>>>>>> 
>>>>>>>>>>> String encode(String text, PDFont font) throws Exception {
>>>>>>>>>>>   StringBuilder builder = new StringBuilder();
>>>>>>>>>>>   byte[] stringBytes = text.getBytes();
>>>>>>>>>>>   int codeLength = 1;
>>>>>>>>>>>   for(int i = 0; i < stringBytes.length; i += codeLength){
>>>>>>>>>>>           String c = font.encode(stringBytes, i, codeLength);
>>>>>>>>>>>           if(c == null && (i + 1 < stringBytes.length)){
>>>>>>>>>>>               codeLength++;
>>>>>>>>>>>               c = font.encode(stringBytes, i, codeLength);
>>>>>>>>>>>           }
>>>>>>>>>>>           builder.append(c);
>>>>>>>>>>>       }
>>>>>>>>>>>   return builder.toString();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> 2. Iterating through the tokens, I find the text either it's a
>>>>>>>> COSString
>>>>>>>>>>> ("Tj" operator) or a COSArray ("TJ" operator) then check if it's
>>>> the
>>>>>>>> text
>>>>>>>>>>> I'm looking for to remove as following:
>>>>>>>>>>> 
>>>>>>>>>>> if (op.getOperation().equals("Tj")) {
>>>>>>>>>>>                       COSString previous = (COSString)
>>>>>> tokens.get(j
>>>>>>>>>> -
>>>>>>>>>>> 1);
>>>>>>>>>>>                       String string = previous.getString();
>>>>>>>>>>>                       String encodedString = encode(string,
>>>> font);
>>>>>>>>>> 
>>>>>>>>>> that string is already encoded. So you'd need to encode "To Be
>>>>>> Approved"
>>>>>>>>>> and compare if that matches the string you are reading from the
>> PDF.
>>>>>>>>>> 
>>>>>>>>>>>                       if(encodedString.contains("To Be
>>>>>> Approved")){
>>>>>>>>>>>                           previous.reset();
>>>>>>>>>>>                       }
>>>>>>>>>>>                   } else if (op.getOperation().equals("TJ")) {
>>>>>>>>>>>                       COSArray previous = (COSArray)
>> tokens.get(j
>>>>>> -
>>>>>>>>>>> 1);
>>>>>>>>>>>                       StringBuilder stringBuilder = new
>>>>>>>>>>> StringBuilder();
>>>>>>>>>>>                       for (int k = 0; k < previous.size(); k++)
>> {
>>>>>>>>>>>                           Object arrElement =
>>>>>>>> previous.getObject(k);
>>>>>>>>>>>                           if (arrElement instanceof COSString) {
>>>>>>>>>>>                               COSString cosString = (COSString)
>>>>>>>>>>> arrElement;
>>>>>>>>>>> 
>>>>>>>>>>> stringBuilder.append(cosString.getString());
>>>>>>>>>>>                           }
>>>>>>>>>>>                       }
>>>>>>>>>>>                       String string = stringBuilder.toString();
>>>>>>>>>>>                       String encodedString = encode(string,
>>>> font);
>>>>>>>>>>>                       if(encodedString.contains("To Be
>>>>>> Approved")){
>>>>>>>>>>>                           previous.clear();
>>>>>>>>>>>                       }
>>>>>>>>>>>                   }
>>>>>>>>>>> 
>>>>>>>>>>> Note:
>>>>>>>>>>> In case of COSArray, I first iterate through the whole array to
>> get
>>>>>> the
>>>>>>>>>>> whole string before encoding and comparison and this works.
>>>>>>>>>>> 
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> a7mad
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Mar 23, 2015 at 10:48 PM, Maruan Sahyoun <
>>>>>>>> [email protected]
>>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> your text is encoded so within the show text operator Tj the
>>>> string
>>>>>> is
>>>>>>>>>>>> 
>>>>>>>>>>>> 7R %H $SSURYHG
>>>>>>>>>>>> 
>>>>>>>>>>>> You wrote that you encode your string to find it - what do you
>>>> get?
>>>>>>>>>>>> 
>>>>>>>>>>>> BR
>>>>>>>>>>>> Maruan
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> Am 23.03.2015 um 22:01 schrieb a7med shre3y <
>>>>>> [email protected]
>>>>>>>>> :
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Maruan,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here's a link from where you can download the PDF.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://drive.google.com/file/d/0B5Kxacm1mej-bm82NzNvUXFPSmMtUjc0ZFVjVVlrODZnRzdn/view?usp=sharing
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>> a7mad
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Mar 23, 2015 at 8:57 PM, Maruan Sahyoun <
>>>>>>>>>> [email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> you need to upload it to a public location as the mailing list
>>>>>>>> doesn't
>>>>>>>>>>>>>> support attachments.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> BR
>>>>>>>>>>>>>> Maruan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Am 23.03.2015 um 19:18 schrieb a7med shre3y <
>>>>>>>> [email protected]
>>>>>>>>>>> :
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Dear Maruan,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank you very much for the information. Please find herewith
>>>>>>>>>> attached
>>>>>>>>>>>>>> the PDF to reproduce the problem.
>>>>>>>>>>>>>>> The text to remove is: "To Be Approved". The text has a
>>>>>> multi-byte
>>>>>>>>>>>>>> encoding, so I call first to encode it in order to find it
>> then
>>>>>>>> remove
>>>>>>>>>>>> it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> a7mad
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mon, Mar 23, 2015 at 4:13 PM, Maruan Sahyoun <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Dear a7mad,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> removing text from a PDF is not an easy task as
>>>>>>>>>>>>>>>> - text which might visually appear as a single item might
>>>>>>>> consistent
>>>>>>>>>>>> of
>>>>>>>>>>>>>> individual parts within the PDF itself e.g. each character or
>>>>>> groups
>>>>>>>>>> of
>>>>>>>>>>>>>> characters are place individually in different COSStrings
>>>>>>>>>>>>>>>> - text might be drawn using graphics commands
>>>>>>>>>>>>>>>> - text can appear within different parts of the PDF (e.g.
>> the
>>>>>> text
>>>>>>>>>>>>>> might be content of a form field AND the annotation
>> representing
>>>>>> the
>>>>>>>>>>>> form
>>>>>>>>>>>>>> field visually)
>>>>>>>>>>>>>>>> - you need to look up the encoding information to get form
>> the
>>>>>>>>>>>>>> characters in the PDF "string" to the ones you are looking for
>>>>>>>>>>>>>>>> ….
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> If you can post a specific PDF to a public location and
>>>> describe
>>>>>>>> in
>>>>>>>>>>>>>> detail which string should have been replaced which hasn't I
>>>> will
>>>>>> be
>>>>>>>>>>>> able
>>>>>>>>>>>>>> to tell you why that might have happened.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Maruan
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Am 23.03.2015 um 15:03 schrieb a7med shre3y <
>>>>>>>>>> [email protected]
>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Currently I am facing a strange problem removing text from
>>>> the
>>>>>>>> some
>>>>>>>>>>>>>> PDFs.
>>>>>>>>>>>>>>>>> My program is able to find the text and "remove it" by
>>>> calling
>>>>>>>> the
>>>>>>>>>>>>>>>>> COSString.reset() method.
>>>>>>>>>>>>>>>>> The problem is, when I open the output PDF file, I still
>> see
>>>>>> the
>>>>>>>>>> text
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> not selectable (I mean when I try to highlight it with the
>>>>>> mouse
>>>>>>>> to
>>>>>>>>>>>>>> copy
>>>>>>>>>>>>>>>>> it, it's not selectable!). When print the content (tokens)
>> of
>>>>>> the
>>>>>>>>>>>>>> output
>>>>>>>>>>>>>>>>> file, I DO NOT find the text at all!!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am currently stuck in the PDF specifications 1.5 and
>> really
>>>>>>>>>> running
>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>> of time.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I'd so much appreciate any help or any idea on what's going
>>>> on.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Notes:
>>>>>>>>>>>>>>>>> 1. I use use PDFBox 1.7.1
>>>>>>>>>>>>>>>>> 2. This problem does not occur with all PDFs, only some
>> PDFs
>>>>>>>> cause
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank you very much.
>>>>>>>>>>>>>>>>> a7mad
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>>>>>>> For additional commands, e-mail:
>> [email protected]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>>>>>> For additional commands, e-mail:
>> [email protected]
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to