no idea about examples

look at implementing endPage() and doing something like:

for (List<TextPosition> aCharactersByArticle : charactersByArticle) {
 for (TextPosition t : aCharactersByArticle) {
 }
}

On May 19, 2012, at 3:54 AM, Hawkins, Thomas A. - Student wrote:

> Any idea as to where I might go for some examples of the textposition class - 
> I've searched the docs and found nothing. Looking over the old threads, I've 
> only found people with issues in regards to textposition. This sounds perfect 
> as to what I need, I just need to figure out how to use it (ie get the x,y 
> and iterate through them)
> 
> Thank you.
> ________________________________________
> From: Ian Holsman [[email protected]]
> Sent: Friday, May 18, 2012 3:46 AM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: PDFBox and superscript format .NET
> 
> You might want to look at the process operator function and watching for 
> tj&ts operators. Ts is the super/subscript operator which might give you the 
> information you need. If you track the textposition class it should give you 
> the x,y position if the lettering.
> Sadly it's harder than it sounds :(
> (I'm a newbie so I might be completely off base)
> 
> Sent from my iPhone
> 
> On 18/05/2012, at 3:37 PM, "Hawkins, Thomas A. - Student" 
> <[email protected]> wrote:
> 
>> As an addendum, I didn't realize when I sent this out - the numbers are a 
>> combination of regular and superscript, since email won't support it, 
>> mathematical operators it is. The numbers should be
>> 8^5       (INSTEAD OF 85)
>> 9^6       (INSTEAD OF 96)
>> 4^7       (INSTEAD OF 47)
>> 10^4     (INSTEAD OF 104)
>> ________________________________________
>> From: Hawkins, Thomas A. - Student [[email protected]]
>> Sent: Friday, May 18, 2012 1:21 AM
>> To: [email protected]
>> Subject: PDFBox and superscript format .NET
>> 
>> I am using the .NET version of PDFBox and I have a pdf that contains data 
>> such as this:
>> 
>> Name                  Location
>> Jim Daviees              85
>> Herschel Walker          96
>> Vince Gogh               47
>> Andrew Lincoln        104
>> 
>> I need both the name value and the location value. When I use the following 
>> code:
>> 
>>   Dim p As PDDocument = PDDocument.load(fi.FullName)
>>                   Dim r As PDFTextStripper = New PDFTextStripper
>> 
>>                   Dim stringVal As String = r.getText(p)
>>                   Dim bytes As Byte() = 
>> System.Text.Encoding.ASCII.GetBytes(stringVal)
>> 
>> I get the following in the .txt file (also in html when I've converted it to 
>> that)
>> Jim Daviees
>> Herschel Walker
>> Vince Gogh
>> Andrew Lincoln
>> 85
>> 96
>> 47
>> 104
>> 
>> I'm okay with the layout, as I've got a work around for that, my problem is 
>> that it destroys any mention of the superscript exponents. Is there a way 
>> that I can locate these superscript parts and encapsulate them in brackets 
>> or something so as the returned value is more like this:
>> Jim Daviees
>> Herschel Walker
>> Vince Gogh
>> Andrew Lincoln
>> 8[5]
>> 9[6]
>> 4[7]
>> 10[4]
>> 
>> So, nutshell time. Can I use pdfbox (.NET Version) to locate the instances 
>> of superscript in a pdf file (like locating <sup></sup> in html) and change 
>> it out for an easily recognized symbol to be output to my destination file. 
>> I picked brackets because I have no brackets in my source file whatsoever 
>> and they would be very easy for me to code around. Thanks in advance.

Reply via email to