So the problematic pdf is this: http://www.parliament.bg/pub/StenD/iv260712.pdf
The first time I opened it in Adobe Reader the entries in the first column showed as garbled glyphs like ȺɅȿɄɋȺɇȾɔɊɊɍɆȿɇɈȼɇȿɇɄɈ. Then I installed Times New Roman font family on my Fedora machine and I restarted Adobe Reader. This fixed and I was able to see correct names like "АЛЕКСАНДЪР РУМЕНОВ НЕНКОВ" This are names persons' names in Cyrillic. I'm using PDFBox along with tabula-extractor ( https://github.com/jazzido/tabula-extractor) to extract table data but it seems even with Times New Roman installed on my machine, the names are still garbled: ȺɅȿɄɋȺɇȾɔɊȾɂɆɂɌɊɈȼɉȺɍɇɈȼ,740,ɄȻ,-,+,+,0,0,+,+,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,- ȺɅȿɄɋȺɇȾɔɊɏɊɂɋɌɈȼɆȿɌɈȾɂȿȼ,917,Ⱦɉɋ,0,0,0,=,=,+,+,-,+,+,-,-,-,-,+,+,+,+,+,+,-,+,+,-,-,- ȺɅȿɄɋɂȼȺɋɂɅȿȼȺɅȿɄɋɂȿȼ,919,ɄȻ,-,+,+,-,-,+,+,0,0,+,-,-,-,-,+,+,-,+,0,+,-,+,+,-,-,- ȺɅɂɈɋɆȺɇɂȻɊȺɂɆɂɆȺɆɈȼ,336,Ⱦɉɋ,0,+,+,-,-,+,+,-,+,+,-,-,-,-,+,0,0,0,0,0,0,0,0,0,0,- ȺɇȾɈɇɉȿɌɊɈȼȺɇȾɈɇɈȼ,856,ȽȿɊȻ,+,=,-,+,+,=,-,+,=,0,0,0,0,0,0,-,+,-,-,-,0,-,-,+,+,0 ȺɇɌɈɇɄɈɇɋɌȺɇɌɂɇɈȼɄɍɌȿȼ,343,ɄȻ,0,0,0,-,-,+,+,-,+,+,0,-,-,-,+,+,-,+,+,+,-,+,0,0,-,- ȺɇɌɈɇɂɃɃɈɊȾȺɇɈȼɃɈɊȾȺɇɈȼ,604,ȽȿɊȻ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ȺɌȺɇȺɋɁȺɎɂɊɈȼɁȺɎɂɊɈȼ,744,ɄȻ,-,+,+,-,-,+,+,0,+,+,-,-,-,-,+,+,-,+,0,+,-,+,+,-,-,- ȺɌȺɇȺɋɂȼȺɇɈȼɌȺɒɄɈȼ,857,ȽȿɊȻ,+,=,=,+,+,-,=,0,0,0,0,0,0,0,0,-,+,-,-,0,0,0,0,0,0,0 Is this something to do with glyphlist_ext described http://pdfbox.apache.org/cookbook/textextraction.html#external-glyph-list ? I tried PDFont font = PDTrueTypeFont.loadTTF(document, "Times New Roman.ttf" ); It didn't do anything. Am I doing something wrong? How can I fix this? Best Regards, Anton

