[
https://issues.apache.org/jira/browse/TIKA-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628859#action_12628859
]
Dave Meikle commented on TIKA-114:
----------------------------------
OK, processLineSeparator and processLineSeparator are not available in
PDFBox-0.7.3 which is what we have as our dependency. They are however
available on SVN HEAD of the PDFBox Incubator project, so if you build and use
that it works fine. I noticed a lot of people are using either dev builds or
their own compiled versions.
I see that they are looking to do a first release under the new Apache
Incubator project, but need to resolve PDFBOX-366
(https://issues.apache.org/jira/browse/PDFBOX-366). Jukka, do you know the
status of this?
If we want to move release TIKA incubating-0.2 before the first PDFBox release
there is a workaround, that I don't particularly like myself but would solve
the problem when using PDFBox-0.7.3 - will attach this in a patch.
> PDFParser : Getting content of the document using "writer.ToString ()" , some
> words are stuck together
> ------------------------------------------------------------------------------------------------------
>
> Key: TIKA-114
> URL: https://issues.apache.org/jira/browse/TIKA-114
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.2-incubating
> Reporter: Rida Benjelloun
> Fix For: 0.2-incubating
>
>
> PDFParser : Getting the content of the document using "writer.ToString ()" ,
> some words are stuck together
> Result of PDF extraction :
> "Apache Tika - Apache Tikahttp://incubator.apache.org/tika/1 of 115.9.2007
> 11:02Tika - Content Analysis ToolkitApache Tika is a toolkit for detecting
> and extracting metadata and structured text content from various documents
> using existing parser libraries. Apache Tika is an effort undergoing
> incubation at The Apache Software Foundation (ASF), sponsored by the Apache
> Lucene PMC. Incubation is required of all newly accepted projects until a
> further review indicates that the infrastructure, communications, and
> decision making process have stabilized in a manner consistent with other
> successful ASF projects. While incubation status is not necessarily a
> reflection of the completeness or stability of the code, it does indicate
> that the project has yet to be fully endorsed by the ASF.See the Apache Tika
> Incubation Status page for the current incubation status.Latest NewsMarch
> 22nd, 2007: Apache Tika project startedThe Apache Tika project was formally
> started when the Tika proposal was accepted by the Apache Incubator PMC."
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.