Dear Kevin Day,
Thanks for your reply.
You can find the stack trace in the image given below.
http://itext-general.2136553.n4.nabble.com/file/n4075502/Stacktrace.png
--
View this message in context:
http://itext-general.2136553.n4.nabble.com/Compare-PDF-Files-containg-chinese-text-tp4075465p40
Can you pleas post the stack trace?
--
View this message in context:
http://itext-general.2136553.n4.nabble.com/Compare-PDF-Files-containg-chinese-text-tp4075465p4075490.html
Sent from the iText - General mailing list archive at Nabble.com.
---
Sorry forgot to mention the code in my post.
Code:
iTextSharp.text.pdf.PdfReader PDFFileReader = new
iTextSharp.text.pdf.PdfReader("c:/Chinese_old.pdf");
String a = PdfTextExtractor.GetTextFromPage(PDFFileReader, 1); *- Get an
error at this statement.*
--
View this message in context:
http://ite
Dear @All,
I am very new to iTextSharp so require you help in doing the following
activity.
I had written a piece of code for comparing the contents of 2 PDF files
suing iTextSharp library for C#.
The code works fine for the PDF files containg English text but throws an
error "Object reference no
Hi All,
I have written an agent in Ltus Notes in JAVA for creating and saving the PDF
file. But when I am doing the same it is giving me no error but file is also
not getting saved. Can some one help me what I must have done wrong I am also
appending the code which I have used to save the file
I am using Eclipse 3.7.1 with "m2e - Maven Integration for Eclipse"
(1.0.100.20110804-1717)
I wrote in pom.xml and get error (Missing artifact
com.itextpdf:itextpdf:jar:5.1.3):
itextpdf.com
Maven Repository for iText
John,
I know it's possible to extract text with other libraries, but I need a
specific way to extract text which is easiest to achieve with iText. I also
know that mentioning other libraries are taken to the heart and not accepted
well on this list (emotions are not coming accross the wire, so
WMJ wrote:
>
> Currently the parser is event-based. I would more love to have a DOM-like
> thing.
>
h... Well that was certainly part of the original design consideration.
But when you are processing a stack based operator stream, and you have the
potential for huge streams, an event base
Dániel Kékesi wrote:
>
> Not to hijack this thread, but what I'd like to do see is to have support
> for
> more encoding types. For example the attached document produces no output
> using any extraction startegy (I tried with 5.1.2).
>
Not a hi-jack at all - I think this is a much more meani
John - actually, I disagree. The parsing module is certainly younger than
the rest of the library, but we consider it to be of strategic importance.
If people open issues in the issue tracker, and provide example PDFs (or
even better, unit tests) that show various problems, then they can be worke
WMJ,
WMJ wrote:
> Currently the parser is event-based. I would more love to have a DOM-like
> thing.
IMO it is a good choice of the current API to work in an event based manner.
On the one hand this requires the least resources --- if it always first
transformed the page content into objects as
It's easy to turn a stream model into a DOM model – just stream it into your
DOM builder.
From: WMJ mailto:sd_...@yahoo.com>>
Reply-To: WMJ mailto:sd_...@yahoo.com>>, Post here
mailto:itext-questions@lists.sourceforge.net>>
Date: Tue, 15 Nov 2011 07:33:56 -0800
To: Post here
mailto:itext-questi
Hello,
I agree. Supporting more encoding types really helps extracting text. Currently
only text with ToUnicode section in the corresponding fonts can be extracted.
WMJ
>
>
>
>Hi Kevin,
>
>Not to hijack this thread, but what I'd like to do see is to have suppor
Hello Kevin,
Currently the parser is event-based. I would more love to have a DOM-like thing.
For example,
q
BT
1 0 0 1 12 12 Tm
/F1 12 Tf %F1 is a font named Times now Roman
(abcdef) Tj
ET
Q
The above commands can be parsed to structured content like this:
abce
All along iText has kept saying it is about creating and updating PDF files,
not parsing or rendering.
The output from PDF Box is attached for the first page of your PDF to show
that it is possible using another library, but as often said here, pay for
some development, or join in and write (or us
Hello,
I agree. Supporting more encoding types really helps extracting text. Currently
only text with ToUnicode section in the corresponding fonts can be extracted.
WMJ
>
>
>
>Hi Kevin,
>
>Not to hijack this thread, but what I'd like to do see is to have suppor
On Tue, November 15, 2011 13:28, Jic wrote:
> It's just that I've seen a lot if these
> Types of postings where the developer
> Is too lazy to attempt to find the solution on their own, and, instead
> Pretty much want other developers
> To pretty much code the solution
> For them.
You are absolute
WMJ wrote:
>
> The parser is not very powerful or convenient yet, but it does point you
> to the most detailed part of the PDF text.
>
WMJ - what would you see as enhancements that would make it more powerful or
convenient? Understanding the details of this will help us improve.
Thanks.
--
V
There are currently two text extraction strategies. One is a very simple
extraction of text directly from the content stream. The other is a much
more advanced, location based extraction (this is the default).
Extending that to add additional formatting capabilities is possible, and
was the inte
It's just that I've seen a lot if these
Types of postings where the developer
Is too lazy to attempt to find the solution on their own, and, instead
Pretty much want other developers
To pretty much code the solution
For them.
Sent from my iPhone
On Nov 15, 2011, at 4:51 AM, "Amedee Van Gasse"
On Tue, November 15, 2011 06:51, Siddhartha Rathi wrote:
> Hi All,
>
> I have written an agent in Ltus Notes in JAVA for creating and saving the
> PDF file. But when I am doing the same it is giving me no error but file
> is also not getting saved. Can some one help me what I must have done
> wrong
Currently you don't have any option.
You have to analyze the position of the extracted text segments and determine
whether there should be spaces between them, whether the adjacent lines belong
to the same paragraph. If you want to know about the color, font, style and
size of the text, you ha
Thanks for pointing me in the right direction - that helped a lot.
I have managed to extract text from my PDF files, but I whished there was
some more "formatting" options on the output - have I missed anything?
I have a small project where I used foolabs Xpdf pdftotext.exe, which have
an option
23 matches
Mail list logo