[jira] [Commented] (PDFBOX-870) PDF-To-IMAGE output is not anti-aliased

2014-03-03 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919044#comment-13919044 ] Tilman Hausherr commented on PDFBOX-870: No I haven't committed it. The reason is

Re: Regarding pdf data extraction

2014-03-03 Thread Alin Mazilu
I don't think that class can help you... All you need is the PDFTextStripper class... On Mon, Mar 3, 2014 at 7:15 PM, Divya Muttineni wrote: > I am trying to convert the tabular data from pdf file to text(.txt) file. > In one of the article I came across > org.apache.pdfbox.pdfviewer.PDFPageDraw

Re: Regarding pdf data extraction

2014-03-03 Thread John Hewson
Take a look at Tabula http://tabula.nerdpower.org which uses PDFBox. -- John > On 3 Mar 2014, at 16:15, Divya Muttineni wrote: > > I am trying to convert the tabular data from pdf file to text(.txt) file. > In one of the article I came across > org.apache.pdfbox.pdfviewer.PDFPageDrawer. > > Ca

Regarding pdf data extraction

2014-03-03 Thread Divya Muttineni
I am trying to convert the tabular data from pdf file to text(.txt) file. In one of the article I came across org.apache.pdfbox.pdfviewer.PDFPageDrawer. Can you please help me how to extend this and override the strokepath() method. Thank you, Divya

[jira] [Commented] (PDFBOX-870) PDF-To-IMAGE output is not anti-aliased

2014-03-03 Thread Nicolas Hoibian (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918846#comment-13918846 ] Nicolas Hoibian commented on PDFBOX-870: Antialiasing does look great! The modifie

Re: [GSoC 2014]Optical Character Recognition project - Introduction

2014-03-03 Thread DImuthu Upeksha
Hi John, I just noticed your last reply just after sending my previous mail. Sorry about that. I'm using Mac also and I'm also using VMs to test other platforms. I have done a lot of stuff using maven. I'll go through the plugin and try to apply it to that github project. Thanks Dimuthu On Tue,

Re: [GSoC 2014]Optical Character Recognition project - Introduction

2014-03-03 Thread DImuthu Upeksha
Hi John, I tried to reuse that android jni wrapper for tesseract. Here is my observation 1. This wrapper heavily depends on android image libraries. (android/bitmap.h). Most of the wrapper methods [1] use this library. 2. But I can understand underlying logic in each function. Basically what it

[jira] [Commented] (PDFBOX-870) PDF-To-IMAGE output is not anti-aliased

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918567#comment-13918567 ] John Hewson commented on PDFBOX-870: I've added PDFBOX-1959 as a meta-issue for dealin

[jira] [Assigned] (PDFBOX-1958) image mask outline with shading pattern is invisible

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson reassigned PDFBOX-1958: --- Assignee: John Hewson > image mask outline with shading pattern is invisible > -

[jira] [Resolved] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson resolved PDFBOX-1956. - Resolution: Invalid The _example b_ file contains invalid text, for example, using Adobe Reader

[jira] [Closed] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson closed PDFBOX-1956. --- > Wrong character on conversion PDF to TXT > > >

[jira] [Closed] (PDFBOX-1957) PDFCreator and PDFBox

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson closed PDFBOX-1957. --- > PDFCreator and PDFBox > - > > Key: PDFBOX-1957 > U

Re: [GSoC 2014]Optical Character Recognition project - Introduction

2014-03-03 Thread John Hewson
Dimuthu Your new diagram looks good. The JNI wrapper for Tesseract is indeed for Android, so it will need porting to a standard desktop C++ environment. We use Maven to build PDFBox and there is a native-maven plugin which can build JNI projects, see http://docs.codehaus.org/display/MAVENUSER/P

Remove AWT Fonts

2014-03-03 Thread John Hewson
Hi All I wanted to bring PDFBOX-1959 to the attention of the mailing list. PDFBox is ready to leave AWT font rendering behind as the JDKs rendering has proven to be buggy and we now have our own renderers for all font types in 2.0.0. Before we can do this we need to ship a set of standard 14 fo

[jira] [Updated] (PDFBOX-1701) Better font than the standard font

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-1701: Summary: Better font than the standard font (was: Suggestion: better font than the standard font)

[jira] [Updated] (PDFBOX-1959) Remove AWT Fonts

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-1959: Fix Version/s: 2.0.0 > Remove AWT Fonts > > > Key: PDFBOX-1959 >

[jira] [Commented] (PDFBOX-870) PDF-To-IMAGE output is not anti-aliased

2014-03-03 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918515#comment-13918515 ] Tilman Hausherr commented on PDFBOX-870: The ones described in PDFBOX-1800 and PDF

[jira] [Created] (PDFBOX-1959) Remove AWT Fonts

2014-03-03 Thread John Hewson (JIRA)
John Hewson created PDFBOX-1959: --- Summary: Remove AWT Fonts Key: PDFBOX-1959 URL: https://issues.apache.org/jira/browse/PDFBOX-1959 Project: PDFBox Issue Type: Improvement Components:

[jira] [Commented] (PDFBOX-870) PDF-To-IMAGE output is not anti-aliased

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918501#comment-13918501 ] John Hewson commented on PDFBOX-870: What were your font modifications? > PDF-To-IMAG

[jira] [Updated] (PDFBOX-870) PDF-To-IMAGE output is not anti-aliased

2014-03-03 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-870: --- Attachment: a_metro-vlc.pdf-1-antialiasing-fontmods.png Here's the file rendered with my font

[jira] [Updated] (PDFBOX-1069) Ubuntu throws exceptions when fonts missing

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-1069: Summary: Ubuntu throws exceptions when fonts missing (was: Ubuntu throws org.apache.pdfbox.pdmode

[jira] [Updated] (PDFBOX-567) PDDocument always creates a real file

2014-03-03 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-567: --- Summary: PDDocument always creates a real file (was: PDDocument always creates a real file. Added a

[jira] [Commented] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918227#comment-13918227 ] Tilman Hausherr commented on PDFBOX-1956: - I don't get anything useful when doing

[jira] [Comment Edited] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917885#comment-13917885 ] Vicente edited comment on PDFBOX-1956 at 3/3/14 10:02 AM: -- Both

[jira] [Updated] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vicente updated PDFBOX-1956: Attachment: itext_pdfabc-sample.pdf example b.pdf > Wrong character on conversion PDF to T

[jira] [Updated] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vicente updated PDFBOX-1956: Attachment: (was: cliente Mercedes2.pdf) > Wrong character on conversion PDF to TXT > ---

[jira] [Updated] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vicente updated PDFBOX-1956: Attachment: (was: cliente Mercedes.pdf) > Wrong character on conversion PDF to TXT >

[jira] [Updated] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vicente updated PDFBOX-1956: Attachment: (was: example b.pdf) > Wrong character on conversion PDF to TXT >

[jira] [Updated] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vicente updated PDFBOX-1956: Attachment: cliente Mercedes2.pdf cliente Mercedes.pdf > Wrong character on conversion P

[jira] [Updated] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vicente updated PDFBOX-1956: Attachment: (was: example a.pdf) > Wrong character on conversion PDF to TXT >

[jira] [Issue Comment Deleted] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vicente updated PDFBOX-1956: Comment: was deleted (was: When I get file A to convert in text the result is OK but when I get file B th

[jira] [Comment Edited] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917885#comment-13917885 ] Vicente edited comment on PDFBOX-1956 at 3/3/14 9:51 AM: - Both fi

[jira] [Commented] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917887#comment-13917887 ] Vicente commented on PDFBOX-1956: - When I get file A to convert in text the result is OK

[jira] [Updated] (PDFBOX-1956) Wrong character on conversion PDF to TXT

2014-03-03 Thread Vicente (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vicente updated PDFBOX-1956: Attachment: example b.pdf example a.pdf Both files have the content. A file was created by