AW: Extracting vector graphics from PDF

Andrey Kuznetsov Mon, 07 May 2012 15:35:38 -0700

Peter,


the parser is like a platypus - it doing not much - just parse CFF font and
create some CFF-specific objects.

As I already said, this is only half of work - I have to implement Type1
font writer to make that work.

 

Regarding hack, I think that PdfBox already has it.

You may get encoding and font metrics from it.

 

What I really don't understand is - what is exactly does not working?

If Font is working on "normal" Graphics it should also work on your "hacked"
graphics.

So what is your problem???

 

Andrey

 

 

 

 

Von: [email protected]
[mailto:[email protected]] Im Auftrag von Peter Murray-Rust
Gesendet: Montag, 7. Mai 2012 15:24
An: Andrey Kuznetsov
Cc: [email protected]
Betreff: Re: Extracting vector graphics from PDF

 

 

On Mon, May 7, 2012 at 1:31 PM, Andrey Kuznetsov <[email protected]> wrote:

Peter,

 

The COS output is horrible formatted,  so I read only first line ;-)


Sorry - that is what COSDictionary.toString() gave.
 

It uses FontFile3 stream.

FontFile3 stream contains font either in Compact Font Format ( CFF) or
OpenType Format (OTF) 

which are not supported by java.

The font name is "FKAJPF+AdvOT3b30f6db.B" which means that it is subset font
of font named "AdvOT3b30f6db.B".


I am ignorant about fonts so please correct any errors.
 

I don't know exactly how PdfBox handles CFF/OpenType fonts, probably they
just search for surrogate font (by name) or some kind default font (since I
never saw such horrible font name in system fonts).


I have no idea where the font came from. It's probably created by the
publisher or bought from a supplier. 

 

I don't know if this is really useful for you. 


It's very useful! First it explains why I had problems and gives me
confidence in the process.
 

I also have no idea why font name/style are not set.

It may be nevertheless valid font.

 

BTW The only way to make java understand CFF/OTF fonts is to convert them to
Type1 fonts.

I doubt that there are any free java program which could do it.


Thanks for the information. 

 

/ (I managed to write parser for CFF fonts, but still have to dig into Type1
font format, however my to do list is really long and Type1 format in not on
first place ;-))


What does the parser do?
 

 

Best Regards


I shall probably create a hack of some kind. I can find a san-serif and
serif which are "fairly close" and substitute them.  How would I get a
system COSDictionary I could substitute?

I am mainly interested in:
* the identity of the characters
* the font metrics of  the characters. 

In this way I can guess the words and the spaces between them.

 

Andrey

 

 



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

AW: Extracting vector graphics from PDF

Reply via email to