Words/characters order is not preserved during text extraction

Trasca Virgil Fri, 11 Feb 2011 05:34:35 -0800

Hi,

 
Did anybody have this issue before? You can see in the attached screen shot the 
original text in the document is


<0>652.5</0> while the extracted text is 652.5<0> </0>. I am using PDFBox 1.4.0

I get this behavior with both ExtracText application and with the
PDFTextStripper class. 

What could be the cause for this? Is there any solution or work around to this? 

Thanks,
Virgil

TK# First Name Last Name TY FYTD Hours LY FYTD Hours Variance
FYTD Hours Comparison-Litigation
Active Timekeepers Only Sorted by Variance
652.5<0> </0>
<1>134567</1>
0704 William G. Voit 709.3 56.8 652.5
0714 Jeffrey L. Sklar 573.7 0.0 573.7
0701 Todd D. Erb 608.2 54.3 553.9
0720 Jason M. Porter 539.6 0.0 539.6
0708 Marshall Ray 595.2 68.7 526.5
0707 Jamie L. Zimmerman 596.2 82.8 513.4
0718 Georgia L. Hamann 498.5 0.0 498.5
0610 Daniel F. Polsenberg 1,301.3 949.1 352.2
0574 Lisa W. Lackland 484.1 261.3 222.8
0602 Joice B. Bass 491.2 274.8 216.4
0703 Alexandra G. Gormley 260.6 46.6 214.0
0618 David C. McElhinney 649.9 443.5 206.4
0527 Ross L. Crown 510.5 304.9 205.6
0327 Ann-Martha Andrews 598.1 393.1 205.0
7602 Donna Simpson 435.1 247.6 187.5
0591 Milton A. Wagner 549.2 371.7 177.5
0012 Joseph E. McGarry 376.8 211.9 164.9
7531 Noel E. Reddy 611.7 462.4 149.3
0651 Emily G. Clark 278.4 145.6 132.8
0396 Frances J. Haynes 603.2 471.0 132.2
0690 Sarah E.J. Selzer 584.3 460.5 123.8
0459 Candida M. Ruesga 456.6 333.2 123.4
0362 Robert G. Schaffer 601.5 479.9 121.6
0125 Dale A. Danneman 281.0 172.2 108.8
0727 Jennifer K. Hostetler 95.2 0.0 95.2
Thursday, January 20, 2011 Page 1 of 5

Words/characters order is not preserved during text extraction

Reply via email to