Re: [iText-questions] extracting text from pdfs with japanese data

2009-01-16 Thread Kevin Day
ag, 15. Januar 2009 18:24> An: IText Questions> Betreff: Re: [iText-questions] extracting text from pdfs with japanese data> > OK - that means that either my code is bad, or the strategy of using UniJIS-UTF16-> H when no ToUnicode map is provided is flawed.> > At this stage, one

Re: [iText-questions] extracting text from pdfs with japanese data

2009-01-15 Thread Hoppe, Michael
009 18:24 > An: IText Questions > Betreff: Re: [iText-questions] extracting text from pdfs with japanese data > > OK - that means that either my code is bad, or the strategy of using > UniJIS-UTF16- > H when no ToUnicode map is provided is flawed. > > At this stage, one of

Re: [iText-questions] extracting text from pdfs with japanese data

2009-01-15 Thread Kevin Day
+0100 Subject: Re: [iText-questions] extracting text from pdfs with japanese data Hi Kevin, Also sorry for the delay. I was in vacation until today. The txt-file you attached to your last mail does not show any japanese characters but only gibberish (i am using a unicode editor, so it should show

Re: [iText-questions] extracting text from pdfs with japanese data

2009-01-12 Thread Hoppe, Michael
Germany www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/> Von: Kevin Day [mailto:ke...@trumpetinc.com] Gesendet: Mo 05.01.2009 17:46 An: IText Questions Betreff: Re: [iText-questions] extracting text from pdfs with japanese data Sorry for the delay. Bec

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-18 Thread Hoppe, Michael
1 76344 Eggenstein-Leopoldshafen, Germany www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/> Von: Kevin Day [mailto:ke...@trumpetinc.com] Gesendet: Freitag, 19. Dezember 2008 01:43 An: IText Questions Betreff: Re: [iText-questions] extracting text from pdfs with japanese data Michael-

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-18 Thread Kevin Day
questions about iText here" > > > Cc: >     Date: Wed, 17 Dec 2008 17:12:58 +010 0> Subject: Re: [iText-questions] extracting text from > pdfs with japanese data>   > Hi all,>  > Attached see the Pdfs i had the problems with (I send > th

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-18 Thread Kevin Day
stions about iText here Cc:  Date: Thu, 18 Dec 2008 11:22:47 +0000 Subject: Re: [iText-questions] extracting text from pdfs with japanese data    >  > So, I've learned a lot - and the answer to the OP is that the > tic_dogu2.pdf file doesn't have computer readable text in it at a

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-18 Thread Paulo Soares
> -Original Message- > From: Kevin Day [mailto:ke...@trumpetinc.com] > Sent: Wednesday, December 17, 2008 11:09 PM > To: IText Questions > Subject: Re: [iText-questions] extracting text from pdfs with > japanese data > > Ahhh mea-culpa... I do think I re

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-17 Thread Kevin Day
ur questions about iText here Cc:  Date: Wed, 17 Dec 2008 15:09:16 -0500 Subject: Re: [iText-questions] extracting text from pdfs with japanese data   If there is no ToUnicode table for an Identity-H encoded font, then you can't get the text.  cmaps aren't relevant in that case :).

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-17 Thread Kevin Day
l Message ---    From: Paulo Soares To: Post all your questions about iText here Cc:  Date: Wed, 17 Dec 2008 19:54:02 +0000 Subject: Re: [iText-questions] extracting text from pdfs with japanese data    > -Original Message-> From: Kevin Day [mailto:ke...@trumpetin

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-17 Thread Leonard Rosenthol
(depends on step 7 above) What do you all think? - K --- Original Message --- From: "Hoppe, Michael" To: "Post all your questions about iText here" > Cc: Date: Wed, 17 Dec 2008 17:12:58 +0100 Subject: Re: [iText-questions] extracting text from pdfs

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-17 Thread Paulo Soares
> -Original Message- > From: Kevin Day [mailto:ke...@trumpetinc.com] > Sent: Wednesday, December 17, 2008 7:40 PM > To: IText Questions > Subject: Re: [iText-questions] extracting text from pdfs with > japanese data > > So far so good - but how do we figure out

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-17 Thread Kevin Day
: Post all your questions about iText here Cc:  Date: Wed, 17 Dec 2008 19:20:18 + Subject: Re: [iText-questions] extracting text from pdfs with japanese data    Some quick pointers:- Identity-H means that the codepoints in the content match the CID characters. To know what is what a look at

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-17 Thread Paulo Soares
: Kevin Day [mailto:ke...@trumpetinc.com] > Sent: Wednesday, December 17, 2008 6:35 PM > To: IText Questions > Subject: Re: [iText-questions] extracting text from pdfs with > japanese data > > OK - we know that content1.pdf is choking b/c of the embedded > images. To fix that,

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-17 Thread Kevin Day
  - K     --- Original Message ---    From: "Hoppe, Michael" To: "Post all your questions about iText here" Cc:  Date: Wed, 17 Dec 2008 17:12:58 +0100 Subject: Re: [iText-questions] extracting text from pdfs with japanes

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-17 Thread Paulo Soares
Fax +49 7247 808-133 > michael.ho...@fiz-karlsruhe.de > > > FIZ Karlsruhe > Hermann-von-Helmholtz-Platz 1 > 76344 Eggenstein-Leopoldshafen, Germany > > www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/> > > Von: Kevin Day [mailto:ke...@trumpetinc.com] > Gese

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-17 Thread Kevin Day
008 09:55:36 - Subject: Re: [iText-questions] extracting text from pdfs with japanese data    There's code in PdfEncodings to parse and convert to/from Unicode the cmaps. The font contains the cmap name.Paulo- Original Message - From: "1T3XT info" To: "Post all your ques

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-16 Thread Paulo Soares
ect: Re: [iText-questions] extracting text from pdfs with japanese data Hoppe, Michael wrote: > The CMap-files are included in the iTextAsianCmaps.jar. So couldn’t they > be read from that jar in case there is no font information in the pdf? I'm just thinking out loud here, I didn't div

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-16 Thread 1T3XT info
Hoppe, Michael wrote: > The CMap-files are included in the iTextAsianCmaps.jar. So couldn’t they > be read from that jar in case there is no font information in the pdf? I'm just thinking out loud here, I didn't dive into the problem yet, but: do you think it's possible for iText to find which C

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-16 Thread Hoppe, Michael
our questions about iText here Betreff: Re: [iText-questions] extracting text from pdfs with japanese data No font or cmap - you need external info (either the font itself, a separate cmap file or both). Leonard On Dec 15, 2008, at 11:50 AM, Kevin Day wrote: I ran

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-15 Thread Leonard Rosenthol
-- From: "Hoppe, Michael" To: Cc: Date: Mon, 15 Dec 2008 13:45:47 +0100 Subject: [iText-questions] extracting text from pdfs with japanese data Dear all, My name is Michael Hoppe, i work for the eSciDoc-Project that is funded by the german ministery of education and research

Re: [iText-questions] extracting text from pdfs with japanese data

2008-12-15 Thread Kevin Day
et input from other folks so we can figure out how to proceed.   - K     --- Original Message ---    From: "Hoppe, Michael" To:  Cc:  Date: Mon, 15 Dec 2008 13:45:47 +0100 Subject: [iText-questions] extracting text from pdfs with japanese data    Dear all, My name is Michael Ho