Thans for sharing your experience about this Peter! I will have to use the “heavy” comparison then for the font name! .. I thought their might be another indication for this in my attached PDF file.
Best regards, Hesham -------------------------------------------------------------------------------------------------- Included Message: I have processed over 100,000 PDFs (mainly scientific publications) and I am reasonably certain there is no universal property that is "Bold" that can be algorithmically detected. "Bold" is an instruction for the authoring software to create something that stands out visually. This can be done by: * making the glyph linewidth thicker or otherwise adding pixels * making the glyph "blacker" relatiove to the "normal text". Often normal text has a grey colour and bold is simply blacker * overprinting the glyph. (works on certain printers) In terms of font names I have found "Foo.B" "FooBold" "FooBlack" "FooHeavy" "Foo.20B" "Foo+20" and any conceivable variant. So of these systems set a bold weight that PDFBox can detect. Many do not. In short it's a mess. On Mon, Mar 18, 2019 at 9:23 PM Gilad Denneboom <[email protected] <mailto:[email protected]> > wrote: > I don't see why there *must* be such an option. Bold fonts are not a > subset of existing fonts, despite what it might look like when you use > Word (which creates fake bold fonts on its own). > They exist on their own, with their own names. True, they are usually > a variant of another existing font, but there's no mandatory naming > scheme that says that if font X exists then the bold variant will be > called X-Bold, or something like that, or that such a variant has to > exist in the first place. > > On Mon, Mar 18, 2019 at 12:12 PM Hesham Gneady > <[email protected] <mailto:[email protected]> > > wrote: > > > I have 100s of PDF files used! > > > > There must be some property used in my attached PDF file that cause > > the bold font, not just the font type used! .. I see properties like > > ForceBold() but it’s set to false too .. I mean; something like that? > > > > > > > > > > > > Best regards, > > > > Hesham > > > > > > > > > > > ---------------------------------------------------------------------- > ---------------------------- > > > > Included Message: > > > > > > > > Instead of a partial match for the name you could compile a list of > > all the names of the bold variants of your fonts, and then compare > > the font name to that list. > > > > > > > > On Mon, Mar 18, 2019 at 11:13 AM Hesham Gneady < <mailto: > > [email protected] <mailto:[email protected]> > > > [email protected] <mailto:[email protected]> > > > > > wrote: > > > > > > > > > Hello , > > > > > > > > > > > > > > > > > > > > I am trying to extract the bold text for some PDF files, but some > > > fail > > > > > like this one: > > > > > > > > > > < > https://www.dropbox.com/s/gh2zwdh3sl3isck/Bold%20Font%20Sample.pdf?dl> > > https://www.dropbox.com/s/gh2zwdh3sl3isck/Bold%20Font%20Sample.pdf?d > > l= > > > > > 0 > > > > > > > > > > > > > > > > > > > > I am overriding the processTextPosition (.) method to do this, and > > > i > > > > > have tried all these options, but none has worked for me: > > > > > > > > > > 1. if( > > > > > text.getFont().getFontDescriptor().getFontName().toLowerCase().con > > > tain > > > > > s( > > > > > "bold" ) ) {.} // returns false. > > > > > 2. if( text.getFont().getName().toLowerCase().contains( "bold" ) > > {.} > > > > > // returns false. > > > > > 3. System.out.println( > > > > > text.getFont().getFontDescriptor().getFontWeight() ); // returns 0.0. > > > > > 4. System.out.println( getGraphicsState().getLineWidth() ); // > > > > > returns > > > > > 1.0. > > > > > 5. System.out.println( > > > > > getGraphicsState().getTextState().getRenderingMode() ); // > > > returns > > > > > FILL > > > > > > > > > > > > > > > > > > > > Note: The font name for the bold text in the PDF file is > > > > > "frutigernextlt-heavycn". It has the word "heavy". I could detect > > > it > > > > > this way, but I think this is not a right procedure, as I have > > > other > > > > > PDF files with font names that have the "heavy" word while they're > > > not > > bold. > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > Hesham > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > This email has been checked for viruses by Avast antivirus software. > > > > > <https://www.avast.com/antivirus> https://www.avast.com/antivirus > > > > > > > > > > -- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

