On Thu, Nov 11, 2010 at 10:14 AM, Staffan wrote:
> Hi,
>
> Current trunk/0.8RC seems to concatenate the PDF body from PDFBox into
> one line. Last time I tested trunk, about a month ago, it did not. See
> the following command line output:
>
Had the time to make a unit test now and track the regre
[
https://issues.apache.org/jira/browse/TIKA-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Staffan Olsson updated TIKA-548:
Attachment: tika-PDF-content-regression-test.patch
> PDF content extracted as single line
> -
PDF content extracted as single line
Key: TIKA-548
URL: https://issues.apache.org/jira/browse/TIKA-548
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.8
Repo
BTW: that said, thanks for taking the time to implement this functionality – it
looks great and of course I’m +1 for making it easier for you guys to use Tika
in your company!
Cheers,
Chris
On 11/11/10 6:38 AM, "Maxim Valyanskiy" wrote:
Hello!
11.11.2010 17:05, Jukka Zitting пишет:
> Log:
>
Hi Max,
>
> We have POI-based utility that extracts all embedded files (attachments,
> pictures
> and etc) from different file formats. This utility takes arbitrary file and
> returns ZIP-archive with all attachments.
>
> This utility duplicates functionality of embedded file processing in Tika.
On Thu, 11 Nov 2010, Maxim Valyanskiy wrote:
So I need to create JIRA issue before commit?
Yup. If it's a major change, or you're not sure about the route to take,
post the patch for review on the jira first. If it's a smaller change (eg
the scope of this one), create the jira before you star
Hello!
11.11.2010 17:05, Jukka Zitting пишет:
Log:
Extract interface for EmbeddedDocumentExtractor
We have POI-based utility that extracts all embedded files (attachments, pictures
and etc) from different file formats. This utility takes arbitrary file and
returns ZIP-archive with all attac
Hi Chris,
We built/ran Bixo against the released Tika 0.8 jars, and it passed
all of our tests.
+1 for me
-- Ken
On Nov 9, 2010, at 1:29pm, Mattmann, Chris A (388J) wrote:
Hi Folks,
I have posted a candidate for the Apache Tika 0.8 release. The
source code
is at:
http://people.apache
Hi,
On Thu, Nov 11, 2010 at 3:31 PM, wrote:
> Log:
> Extract interface for EmbeddedDocumentExtractor
It would be good if all non-trivial commit messages contained a
reference to a relevant issue in Jira for better context of why
particular changes are being made.
Nick correctly noted earlier t
Hi,
Current trunk/0.8RC seems to concatenate the PDF body from PDFBox into
one line. Last time I tested trunk, about a month ago, it did not. See
the following command line output:
$> java -jar pdfbox-app-1.3.1.jar ExtractText -console docs/shortpdf.pdf
1 · untitled 3 · 2010-02-13 09:52
10 matches
Mail list logo