Re: [iText-questions] Compare PDF Files containg chinese text

2011-11-15 Thread Raghavendra L
Dear Kevin Day, Thanks for your reply. You can find the stack trace in the image given below. http://itext-general.2136553.n4.nabble.com/file/n4075502/Stacktrace.png -- View this message in context: http://itext-general.2136553.n4.nabble.com/Compare-PDF-Files-containg-chinese-text-tp4075465p40

Re: [iText-questions] Compare PDF Files containg chinese text

2011-11-15 Thread Kevin Day
Can you pleas post the stack trace? -- View this message in context: http://itext-general.2136553.n4.nabble.com/Compare-PDF-Files-containg-chinese-text-tp4075465p4075490.html Sent from the iText - General mailing list archive at Nabble.com. ---

Re: [iText-questions] Compare PDF Files containg chinese text

2011-11-15 Thread Raghavendra L
Sorry forgot to mention the code in my post. Code: iTextSharp.text.pdf.PdfReader PDFFileReader = new iTextSharp.text.pdf.PdfReader("c:/Chinese_old.pdf"); String a = PdfTextExtractor.GetTextFromPage(PDFFileReader, 1); *- Get an error at this statement.* -- View this message in context: http://ite

[iText-questions] Compare PDF Files containg chinese text

2011-11-15 Thread Raghavendra L
Dear @All, I am very new to iTextSharp so require you help in doing the following activity. I had written a piece of code for comparing the contents of 2 PDF files suing iTextSharp library for C#. The code works fine for the PDF files containg English text but throws an error "Object reference no

[iText-questions] Saving the PDF created on SUSE Linux Server (updated)

2011-11-15 Thread Siddhartha Rathi
Hi All, I have written an agent in Ltus Notes in JAVA for creating and saving the PDF file. But when I am doing the same it is giving me no error but file is also not getting saved. Can some one help me what I must have done wrong I am also appending the code which I have used to save the file

[iText-questions] Maven repo http://maven.itextpdf.com/ doesn't work

2011-11-15 Thread Vladimir
I am using Eclipse 3.7.1 with "m2e - Maven Integration for Eclipse" (1.0.100.20110804-1717) I wrote in pom.xml and get error (Missing artifact com.itextpdf:itextpdf:jar:5.1.3): itextpdf.com Maven Repository for iText

Re: [iText-questions] FW: Save PDF as plain text

2011-11-15 Thread Dániel Kékesi
John, I know it's possible to extract text with other libraries, but I need a specific way to extract text which is easiest to achieve with iText. I also know that mentioning other libraries are taken to the heart and not accepted well on this list (emotions are not coming accross the wire, so

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread Kevin Day
WMJ wrote: > > Currently the parser is event-based. I would more love to have a DOM-like > thing. > h... Well that was certainly part of the original design consideration. But when you are processing a stack based operator stream, and you have the potential for huge streams, an event base

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread Kevin Day
Dániel Kékesi wrote: > > Not to hijack this thread, but what I'd like to do see is to have support > for > more encoding types. For example the attached document produces no output > using any extraction startegy (I tried with 5.1.2). > Not a hi-jack at all - I think this is a much more meani

Re: [iText-questions] FW: Save PDF as plain text

2011-11-15 Thread Kevin Day
John - actually, I disagree. The parsing module is certainly younger than the rest of the library, but we consider it to be of strategic importance. If people open issues in the issue tracker, and provide example PDFs (or even better, unit tests) that show various problems, then they can be worke

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread mkl
WMJ, WMJ wrote: > Currently the parser is event-based. I would more love to have a DOM-like > thing. IMO it is a good choice of the current API to work in an event based manner. On the one hand this requires the least resources --- if it always first transformed the page content into objects as

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread Leonard Rosenthol
It's easy to turn a stream model into a DOM model – just stream it into your DOM builder. From: WMJ mailto:sd_...@yahoo.com>> Reply-To: WMJ mailto:sd_...@yahoo.com>>, Post here mailto:itext-questions@lists.sourceforge.net>> Date: Tue, 15 Nov 2011 07:33:56 -0800 To: Post here mailto:itext-questi

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread WMJ
Hello, I agree. Supporting more encoding types really helps extracting text. Currently only text with ToUnicode section in the corresponding fonts can be extracted. WMJ > > > >Hi Kevin, > >Not to hijack this thread, but what I'd like to do see is to have suppor

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread WMJ
Hello Kevin, Currently the parser is event-based. I would more love to have a DOM-like thing. For example, q BT 1 0 0 1 12 12 Tm /F1 12 Tf %F1 is a font named Times now Roman (abcdef) Tj ET Q The above commands can be parsed to structured content like this:               abce

[iText-questions] FW: Save PDF as plain text

2011-11-15 Thread John Renfrew
All along iText has kept saying it is about creating and updating PDF files, not parsing or rendering. The output from PDF Box is attached for the first page of your PDF to show that it is possible using another library, but as often said here, pay for some development, or join in and write (or us

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread WMJ
Hello, I agree. Supporting more encoding types really helps extracting text. Currently only text with ToUnicode section in the corresponding fonts can be extracted. WMJ > > > >Hi Kevin, > >Not to hijack this thread, but what I'd like to do see is to have suppor

Re: [iText-questions] Saving the PDF created on SUSE Linux Server

2011-11-15 Thread Amedee Van Gasse
On Tue, November 15, 2011 13:28, Jic wrote: > It's just that I've seen a lot if these > Types of postings where the developer > Is too lazy to attempt to find the solution on their own, and, instead > Pretty much want other developers > To pretty much code the solution > For them. You are absolute

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread Kevin Day
WMJ wrote: > > The parser is not very powerful or convenient yet, but it does point you > to the most detailed part of the PDF text. > WMJ - what would you see as enhancements that would make it more powerful or convenient? Understanding the details of this will help us improve. Thanks. -- V

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread Kevin Day
There are currently two text extraction strategies. One is a very simple extraction of text directly from the content stream. The other is a much more advanced, location based extraction (this is the default). Extending that to add additional formatting capabilities is possible, and was the inte

Re: [iText-questions] Saving the PDF created on SUSE Linux Server

2011-11-15 Thread Jic
It's just that I've seen a lot if these Types of postings where the developer Is too lazy to attempt to find the solution on their own, and, instead Pretty much want other developers To pretty much code the solution For them. Sent from my iPhone On Nov 15, 2011, at 4:51 AM, "Amedee Van Gasse"

Re: [iText-questions] Saving the PDF created on SUSE Linux Server

2011-11-15 Thread Amedee Van Gasse
On Tue, November 15, 2011 06:51, Siddhartha Rathi wrote: > Hi All, > > I have written an agent in Ltus Notes in JAVA for creating and saving the > PDF file. But when I am doing the same it is giving me no error but file > is also not getting saved. Can some one help me what I must have done > wrong

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread WMJ
Currently you don't have any option. You have to analyze the position of the extracted text segments and determine whether there should be spaces between them, whether the adjacent lines belong to the same paragraph. If you want to know about the color, font, style and size of the text, you ha

Re: [iText-questions] Save PDF as plain text

2011-11-15 Thread Verakso
Thanks for pointing me in the right direction - that helped a lot. I have managed to extract text from my PDF files, but I whished there was some more "formatting" options on the output - have I missed anything? I have a small project where I used foolabs Xpdf pdftotext.exe, which have an option