Any idea as to where I might go for some examples of the textposition class - 
I've searched the docs and found nothing. Looking over the old threads, I've 
only found people with issues in regards to textposition. This sounds perfect 
as to what I need, I just need to figure out how to use it (ie get the x,y and 
iterate through them)

Thank you.
________________________________________
From: Ian Holsman [[email protected]]
Sent: Friday, May 18, 2012 3:46 AM
To: [email protected]
Cc: [email protected]
Subject: Re: PDFBox and superscript format .NET

You might want to look at the process operator function and watching for tj&ts 
operators. Ts is the super/subscript operator which might give you the 
information you need. If you track the textposition class it should give you 
the x,y position if the lettering.
Sadly it's harder than it sounds :(
(I'm a newbie so I might be completely off base)

Sent from my iPhone

On 18/05/2012, at 3:37 PM, "Hawkins, Thomas A. - Student" <[email protected]> 
wrote:

> As an addendum, I didn't realize when I sent this out - the numbers are a 
> combination of regular and superscript, since email won't support it, 
> mathematical operators it is. The numbers should be
> 8^5       (INSTEAD OF 85)
> 9^6       (INSTEAD OF 96)
> 4^7       (INSTEAD OF 47)
> 10^4     (INSTEAD OF 104)
> ________________________________________
> From: Hawkins, Thomas A. - Student [[email protected]]
> Sent: Friday, May 18, 2012 1:21 AM
> To: [email protected]
> Subject: PDFBox and superscript format .NET
>
> I am using the .NET version of PDFBox and I have a pdf that contains data 
> such as this:
>
> Name                  Location
> Jim Daviees              85
> Herschel Walker          96
> Vince Gogh               47
> Andrew Lincoln        104
>
> I need both the name value and the location value. When I use the following 
> code:
>
>    Dim p As PDDocument = PDDocument.load(fi.FullName)
>                    Dim r As PDFTextStripper = New PDFTextStripper
>
>                    Dim stringVal As String = r.getText(p)
>                    Dim bytes As Byte() = 
> System.Text.Encoding.ASCII.GetBytes(stringVal)
>
> I get the following in the .txt file (also in html when I've converted it to 
> that)
> Jim Daviees
> Herschel Walker
> Vince Gogh
> Andrew Lincoln
> 85
> 96
> 47
> 104
>
> I'm okay with the layout, as I've got a work around for that, my problem is 
> that it destroys any mention of the superscript exponents. Is there a way 
> that I can locate these superscript parts and encapsulate them in brackets or 
> something so as the returned value is more like this:
> Jim Daviees
> Herschel Walker
> Vince Gogh
> Andrew Lincoln
> 8[5]
> 9[6]
> 4[7]
> 10[4]
>
> So, nutshell time. Can I use pdfbox (.NET Version) to locate the instances of 
> superscript in a pdf file (like locating <sup></sup> in html) and change it 
> out for an easily recognized symbol to be output to my destination file. I 
> picked brackets because I have no brackets in my source file whatsoever and 
> they would be very easy for me to code around. Thanks in advance.

Reply via email to