Extracting Text from a particular region in PDF

2011-03-01 Thread Yogesh
the text only from the Results section and not Introduction and Methods. Thanks. - Yogesh

Re: Text extracted from only 1st page, not the rest

2011-02-11 Thread Yogesh
Hi Andreas, I am using the 1.5.0-snapshot from the trunk. What might be causing the error? Thanks - Yogesh 2011/2/11 Andreas Lehmkühler > Hi, > > Gesendet: Mo, 07. Feb 2011 Von: Yogesh > > > Hello, > > > > I am trying to extract Text from PDFs, mostly scie

Identify Titles of sections based on some properties?

2011-02-09 Thread Yogesh
... some more text I need to identify SECTION-3, not just as a word but also with the properties mentioned above. Can we do this? Thanks - Yogesh

Text extracted from only 1st page, not the rest

2011-02-07 Thread Yogesh
org.apache.pdfbox.pdmodel.font.PDSimpleFont extractToUnicodeEncoding SEVERE: Error: Could not load embedded CMAP The handle is invalid What might be wrong. Please help. Thanks -Yogesh

PushBackInputStream error

2011-01-22 Thread Yogesh
Hello, I am parsing some PDFs. For one I get the following error. (The PDF file looks fine) Error: expected='obj' actual='000' org.apache.pdfbox.io.PushBackInputStream@134ce4a I don't know what it means? Please help. Thanks, -Yogesh

Re: Type1C font Error

2011-01-20 Thread Yogesh
Hi, I am still getting the error org.apache.pdfbox.pdmodel.font.PDFontFactory createFont WARNING: Failed to create Type1C font. Falling back to Type1 font java.io.IOException: The handle is invalid -Yogesh On 2 January 2011 13:50, Andreas Lehmkuehler wrote: > Hi, > > > Am 0

Extracting Text from 2 Column PDFs

2010-12-05 Thread Yogesh
correct it? Thanks, -Yogesh

Re: Type1C font Error

2010-12-04 Thread Yogesh
I am getting an IOException, but the StackTrace looks similar. This does not seem to be resolved yet, or is it? -Yogesh On 5 December 2010 01:05, Hesham G. wrote: > Is your problem related to this : > https://issues.apache.org/jira/browse/PDFBOX-708 > > Best regard

Type1C font Error

2010-12-04 Thread Yogesh
these fonts, whatever they are? Please help. Thanks, -Yogesh

Re: Save URLs to PDFs?

2010-11-05 Thread Yogesh
Thanks Grant. But I have thousands of PDF URLs like this. I have tried around 12 so far. Can all of them be corrupt? What can I do about this? - Yogesh On 5 November 2010 18:53, Grant Overby wrote: > I ran the code [2]. The pdf is corrupted by the code as MD5s are different. > File

Re: Save URLs to PDFs?

2010-11-05 Thread Yogesh
eWriter("C:/My.pdf"); int next = 0; while ( ( next = in.read() ) != -1 ) { out.write(next); } Thanks, - Yogesh On 5 November 2010 18:31, Grant Overby wrote: > Hrm, That's odd. > > Can you post the code you tried? An

Re: Save URLs to PDFs?

2010-11-05 Thread Yogesh
Yes. I can download the file through the browser. It works perfectly fine. - Yogesh On 5 November 2010 18:25, Grant Overby wrote: > If you download the file through a browser? Does it work then? > > > -- > Grant Overby > Senior Developer > FloorSoft, Inc. > &g

Re: Save URLs to PDFs?

2010-11-05 Thread Yogesh
I tried with that, it writes a blank PDF. Though, the file size and the number of pages is correct (for the new written file) - Yogesh On 5 November 2010 18:09, Grant Overby wrote: > You don't need pdfBox to do this. Below is some rough code that allows you > to download a file

Save URLs to PDFs?

2010-11-05 Thread Yogesh
Hi, I have PDFs which I can access through URLs. I want to download and save it to files. How can I go about it? Thanks -Yogesh

Extracting symbols from Text

2010-08-24 Thread Yogesh
, -Yogesh

Error with PDFTextStripper

2010-07-19 Thread Yogesh
now what is wrong. Please help. > > Thanks, > > -Yogesh > >

Error with PDFTextStripper

2010-07-08 Thread Yogesh
Hi, I am using pdfbox-1.2.0 for extracting text from PDFs. I am getting the following error when using it. org.apache.pdfbox.pdmodel.font.PDFontFactory createFont WARNING: Failed to create Type1C font. Falling back to Type1 font I do not know what is wrong. Please help. Thanks, -Yogesh

Error: Failed to create Type1C font

2010-07-06 Thread Yogesh
Hi, I am using pdfbox-1.2.0 for extracting text from PDFs. I am getting the following error when using it. org.apache.pdfbox.pdmodel.font.PDFontFactory createFont WARNING: Failed to create Type1C font. Falling back to Type1 font I do not know what is wrong. Please help. Thanks, -Yogesh

Extract Header for sections?

2010-04-24 Thread Yogesh
Hello, I wanted to use PDFBox for my work. How can I extract the Headers for different sections from my PDF? For example, headers like * 1. Introduction* ... * 2. Results *.. Thanks, -Yogesh