ag, 15. Januar 2009 18:24> An: IText Questions> Betreff: Re: [iText-questions] extracting text from pdfs with japanese data> > OK - that means that either my code is bad, or the strategy of using UniJIS-UTF16-> H when no ToUnicode map is provided is flawed.> > At this stage, one
009 18:24
> An: IText Questions
> Betreff: Re: [iText-questions] extracting text from pdfs with japanese data
>
> OK - that means that either my code is bad, or the strategy of using
> UniJIS-UTF16-
> H when no ToUnicode map is provided is flawed.
>
> At this stage, one of
+0100
Subject: Re: [iText-questions] extracting text from pdfs with japanese data
Hi Kevin,
Also sorry for the delay. I was in vacation until today.
The txt-file you attached to your last mail does not show any japanese
characters but only gibberish (i am using a unicode editor, so it should show
Germany
www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/>
Von: Kevin Day [mailto:ke...@trumpetinc.com]
Gesendet: Mo 05.01.2009 17:46
An: IText Questions
Betreff: Re: [iText-questions] extracting text from pdfs with japanese data
Sorry for the delay. Bec
1
76344 Eggenstein-Leopoldshafen, Germany
www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/>
Von: Kevin Day [mailto:ke...@trumpetinc.com]
Gesendet: Freitag, 19. Dezember 2008 01:43
An: IText Questions
Betreff: Re: [iText-questions] extracting text from pdfs with japanese data
Michael-
questions about iText here" > > > Cc: > Date: Wed, 17 Dec 2008 17:12:58 +010
0> Subject: Re: [iText-questions] extracting text from > pdfs with japanese data> > Hi all,> > Attached see the Pdfs i had the problems with (I send > th
stions about iText here
Cc:
Date: Thu, 18 Dec 2008 11:22:47 +0000
Subject: Re: [iText-questions] extracting text from pdfs with japanese data
> > So, I've learned a lot - and the answer to the OP is that the > tic_dogu2.pdf file doesn't have computer readable text in it at a
> -Original Message-
> From: Kevin Day [mailto:ke...@trumpetinc.com]
> Sent: Wednesday, December 17, 2008 11:09 PM
> To: IText Questions
> Subject: Re: [iText-questions] extracting text from pdfs with
> japanese data
>
> Ahhh mea-culpa... I do think I re
ur questions about iText here
Cc:
Date: Wed, 17 Dec 2008 15:09:16 -0500
Subject: Re: [iText-questions] extracting text from pdfs with japanese data
If there is no ToUnicode table for an Identity-H encoded font, then you can't get the text. cmaps aren't relevant in that case :).
l Message ---
From: Paulo Soares
To: Post all your questions about iText here
Cc:
Date: Wed, 17 Dec 2008 19:54:02 +0000
Subject: Re: [iText-questions] extracting text from pdfs with japanese data
> -Original Message-> From: Kevin Day [mailto:ke...@trumpetin
(depends on step
7 above)
What do you all think?
- K
--- Original Message ---
From: "Hoppe, Michael"
To: "Post all your questions about iText here" >
Cc:
Date: Wed, 17 Dec 2008 17:12:58 +0100
Subject: Re: [iText-questions] extracting text from pdfs
> -Original Message-
> From: Kevin Day [mailto:ke...@trumpetinc.com]
> Sent: Wednesday, December 17, 2008 7:40 PM
> To: IText Questions
> Subject: Re: [iText-questions] extracting text from pdfs with
> japanese data
>
> So far so good - but how do we figure out
: Post all your questions about iText here
Cc:
Date: Wed, 17 Dec 2008 19:20:18 +
Subject: Re: [iText-questions] extracting text from pdfs with japanese data
Some quick pointers:- Identity-H means that the codepoints in the content match the CID characters. To know what is what a look at
: Kevin Day [mailto:ke...@trumpetinc.com]
> Sent: Wednesday, December 17, 2008 6:35 PM
> To: IText Questions
> Subject: Re: [iText-questions] extracting text from pdfs with
> japanese data
>
> OK - we know that content1.pdf is choking b/c of the embedded
> images. To fix that,
- K
--- Original Message ---
From: "Hoppe, Michael"
To: "Post all your questions about iText here"
Cc:
Date: Wed, 17 Dec 2008 17:12:58 +0100
Subject: Re: [iText-questions] extracting text from pdfs with japanes
Fax +49 7247 808-133
> michael.ho...@fiz-karlsruhe.de
>
>
> FIZ Karlsruhe
> Hermann-von-Helmholtz-Platz 1
> 76344 Eggenstein-Leopoldshafen, Germany
>
> www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/>
>
> Von: Kevin Day [mailto:ke...@trumpetinc.com]
> Gese
008 09:55:36 -
Subject: Re: [iText-questions] extracting text from pdfs with japanese data
There's code in PdfEncodings to parse and convert to/from Unicode the cmaps. The font contains the cmap name.Paulo- Original Message - From: "1T3XT info" To: "Post all your ques
ect: Re: [iText-questions] extracting text from pdfs with japanese data
Hoppe, Michael wrote:
> The CMap-files are included in the iTextAsianCmaps.jar. So couldn’t they
> be read from that jar in case there is no font information in the pdf?
I'm just thinking out loud here, I didn't div
Hoppe, Michael wrote:
> The CMap-files are included in the iTextAsianCmaps.jar. So couldn’t they
> be read from that jar in case there is no font information in the pdf?
I'm just thinking out loud here, I didn't dive into the problem yet,
but: do you think it's possible for iText to find which C
our questions about iText here
Betreff: Re: [iText-questions] extracting text from pdfs with japanese data
No font or cmap - you need external info (either the font itself, a separate
cmap file or both).
Leonard
On Dec 15, 2008, at 11:50 AM, Kevin Day wrote:
I ran
--
From: "Hoppe, Michael"
To:
Cc:
Date: Mon, 15 Dec 2008 13:45:47 +0100
Subject: [iText-questions] extracting text from pdfs with japanese
data
Dear all,
My name is Michael Hoppe, i work for the eSciDoc-Project that is
funded by the german ministery of education and research
et input from other folks so we can figure out how to proceed.
- K
--- Original Message ---
From: "Hoppe, Michael"
To:
Cc:
Date: Mon, 15 Dec 2008 13:45:47 +0100
Subject: [iText-questions] extracting text from pdfs with japanese data
Dear all,
My name is Michael Ho
22 matches
Mail list logo