[ 
https://issues.apache.org/jira/browse/TIKA-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628859#action_12628859
 ] 

Dave Meikle commented on TIKA-114:
----------------------------------

OK, processLineSeparator  and processLineSeparator are not available in 
PDFBox-0.7.3 which is what we have as our dependency. They are however 
available on SVN HEAD of the PDFBox Incubator project, so if you build and use 
that it works fine. I noticed a lot of people are using either dev builds or 
their own compiled versions.

I see that they are looking to do a first release under the new Apache 
Incubator project, but need to resolve PDFBOX-366 
(https://issues.apache.org/jira/browse/PDFBOX-366). Jukka, do you know the 
status of this?

If we want to move release TIKA incubating-0.2 before the first PDFBox release 
there is a workaround, that I don't particularly like myself but would solve 
the problem when using PDFBox-0.7.3 - will attach this in a patch.


> PDFParser : Getting content of the document using "writer.ToString ()" , some 
> words are stuck together
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-114
>                 URL: https://issues.apache.org/jira/browse/TIKA-114
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.2-incubating
>            Reporter: Rida Benjelloun
>             Fix For: 0.2-incubating
>
>
> PDFParser : Getting the content of the document using "writer.ToString ()" , 
> some words are stuck together
> Result of PDF extraction : 
> "Apache Tika - Apache Tikahttp://incubator.apache.org/tika/1 of 115.9.2007 
> 11:02Tika - Content Analysis ToolkitApache Tika is a toolkit for detecting 
> and extracting metadata and structured text content from various documents 
> using existing parser libraries. Apache Tika is an effort undergoing 
> incubation at The Apache Software Foundation (ASF), sponsored by the Apache 
> Lucene PMC. Incubation is required of all newly accepted projects until a 
> further review indicates that the infrastructure, communications, and 
> decision making process have stabilized in a manner consistent with other 
> successful ASF projects. While incubation status is not necessarily a 
> reflection of the completeness or stability of the code, it does indicate 
> that the project has yet to be fully endorsed by the ASF.See the Apache Tika 
> Incubation Status page for the current incubation status.Latest NewsMarch 
> 22nd, 2007: Apache Tika project startedThe Apache Tika project was formally 
> started when the Tika proposal was accepted by the Apache Incubator PMC."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to