Re: tika install fail on os x 10.9.2

2014-05-16 Thread Ramirez, Paul M (398J)
Annie, I haven't built tika in a while but if it's a typical maven build the details of the test output will be captured in one of the files in the target directory. If you find those details and post them here that would help troubleshoot what is going on. Thanks, Paul Ramirez On May 8, 2014

[jira] [Updated] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1302: -- Description: Many thanks to [~lewismc] for TIKA-1301! Once we get nightly builds up and running again,

[jira] [Commented] (TIKA-1298) testEmbeddedPDFEmbeddingAnotherDocument fails with PDFBox 1.8.5 and java 1.6

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998779#comment-13998779 ] Tim Allison commented on TIKA-1298: --- One reason to upgrade to Java 1.7 if you haven't al

Re: parser metadata empty after tika detect

2014-05-16 Thread Nick Burch
On Fri, 16 May 2014, aliosha79 wrote: For this purpose i have write these few code lines: File f = new File("MyEmail.eml"); is= new FileInputStream(f); Tika tika = new Tika(); String mimeType = tika.detect(is); This will most likely use a fair bit (to possibly all) of

[GitHub] tika pull request: [TIKA-1247] WIP: Exploded parsers: asm, audio, ...

2014-05-16 Thread cstamas
Github user cstamas closed the pull request at: https://github.com/apache/tika/pull/5 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-16 Thread Sergey Beryozkin
Hi Nick, On 14/05/14 14:22, Nick Burch wrote: On Wed, 14 May 2014, Sergey Beryozkin wrote: UnpackerResource has no Path annotation so it is defaulted to "/". Every endpoint method within the class does have one though. I would've expected it to match based on those, is that not the case? JAX-

[jira] [Updated] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1302: -- Description: Many thanks to [~lewismc] for TIKA-1301! Once we get nightly builds up and running again,

[jira] [Created] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1302: - Summary: Let's run Tika against a large batch of docs nightly Key: TIKA-1302 URL: https://issues.apache.org/jira/browse/TIKA-1302 Project: Tika Issue Type: Improve

[jira] [Updated] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1302: -- Description: Many thanks to [~lewismc] for TIKA-1301! Once we get that up and running for nightly buil

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-16 Thread Chris Mattmann
Hi Guys, Some thoughts here: -Original Message- From: Nick Burch Reply-To: "dev@tika.apache.org" Date: Wednesday, May 14, 2014 6:22 AM To: "dev@tika.apache.org" Subject: Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken? >On Wed, 14 May 2014, Sergey Beryozkin wr

parser metadata empty after tika detect

2014-05-16 Thread aliosha79
i'm facing up to with tika parsing. I my use case i have to parse different file types using the right parser, including an .eml file. As input of my app i can have every kind of file. In particular i have a MyEmail.eml file whose content-type is recognized as text/html. I aim to get all the availa

[jira] [Commented] (TIKA-1299) Add table tags to parsed RTF documents

2014-05-16 Thread Alex Hanson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1476#comment-1476 ] Alex Hanson commented on TIKA-1299: --- Apologies in advance if this wasn't the correct way

[jira] [Commented] (TIKA-1298) testEmbeddedPDFEmbeddingAnotherDocument fails with PDFBox 1.8.5 and java 1.6

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000172#comment-14000172 ] Tim Allison commented on TIKA-1298: --- [~tilman], TIKA-1300 opened. Will actually make the

[jira] [Updated] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1302: -- Description: Many thanks to [~lewismc] for TIKA-1301! Once we get that up and running for nightly buil

[jira] [Created] (TIKA-1299) Add table tags to parsed RTF documents

2014-05-16 Thread Alex Hanson (JIRA)
Alex Hanson created TIKA-1299: - Summary: Add table tags to parsed RTF documents Key: TIKA-1299 URL: https://issues.apache.org/jira/browse/TIKA-1299 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1272) tika-server version is incorrectly defined

2014-05-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000198#comment-14000198 ] Lewis John McGibbney commented on TIKA-1272: This patch is good to go. Can some

[jira] [Commented] (TIKA-1299) Add table tags to parsed RTF documents

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000177#comment-14000177 ] Tim Allison commented on TIKA-1299: --- Absolutely the right way. Thank you. Bonus points

[jira] [Commented] (TIKA-1066) tika-server ignoring port option

2014-05-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000197#comment-14000197 ] Lewis John McGibbney commented on TIKA-1066: This patch is good to go. Can some

[jira] [Created] (TIKA-1301) Establish TikaServer on Apache hosted VM

2014-05-16 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created TIKA-1301: -- Summary: Establish TikaServer on Apache hosted VM Key: TIKA-1301 URL: https://issues.apache.org/jira/browse/TIKA-1301 Project: Tika Issue Type: B

[jira] [Assigned] (TIKA-1300) Switch default PDFBox parser to NonSequentialParser

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1300: - Assignee: Tim Allison > Switch default PDFBox parser to NonSequentialParser >

[jira] [Created] (TIKA-1300) Switch default PDFBox parser to NonSequentialParser

2014-05-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1300: - Summary: Switch default PDFBox parser to NonSequentialParser Key: TIKA-1300 URL: https://issues.apache.org/jira/browse/TIKA-1300 Project: Tika Issue Type: Improvem

[jira] [Commented] (TIKA-1298) testEmbeddedPDFEmbeddingAnotherDocument fails with PDFBox 1.8.5 and java 1.6

2014-05-16 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1475#comment-1475 ] Tilman Hausherr commented on TIKA-1298: --- I strongly recommend to ship TIKA with the n

Re: [DISCUSS] Nightly Jenkins Builds for Trunk

2014-05-16 Thread Lewis John Mcgibbney
Hi Nick/Others, Please see link below for Tika trunk build on Oracle JDK's (latest) 6 and 7 respectively. We also have a now deprecated Tika trunk build which was doing zilch... we also have a currently disabled cob configured to run with Oracle JDK8 (latest) when this become available to build mac

[jira] [Commented] (TIKA-1169) Fails to parse jnilib file

2014-05-16 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999705#comment-13999705 ] ASF GitHub Bot commented on TIKA-1169: -- GitHub user mkr opened a pull request: ht

[jira] [Commented] (TIKA-1169) Fails to parse jnilib file

2014-05-16 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999825#comment-13999825 ] ASF GitHub Bot commented on TIKA-1169: -- Github user asfgit closed the pull request at:

[GitHub] tika pull request: TIKA-1169: Adding other Mach-O magic bytes for ...

2014-05-16 Thread mkr
GitHub user mkr opened a pull request: https://github.com/apache/tika/pull/8 TIKA-1169: Adding other Mach-O magic bytes for jnilib files. Adding remaining Mach-o binary signatures to fix TIKA-1169 You can merge this pull request into a Git repository by running: $ git pull http

[GitHub] tika pull request: TIKA-1169: Adding other Mach-O magic bytes for ...

2014-05-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/8 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled

[jira] [Comment Edited] (TIKA-1298) testEmbeddedPDFEmbeddingAnotherDocument fails with PDFBox 1.8.5 and java 1.6

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998779#comment-13998779 ] Tim Allison edited comment on TIKA-1298 at 5/16/14 2:30 PM: One

[GitHub] tika pull request: WIP TIKA-1292: Fixing the MimeTypes class to co...

2014-05-16 Thread cstamas
GitHub user cstamas opened a pull request: https://github.com/apache/tika/pull/7 WIP TIKA-1292: Fixing the MimeTypes class to consider "clusters" of magics by priority This changes MimeTypes class to consider "clusters" of magics instead of "first found" to resolve priority clashes

[jira] [Commented] (TIKA-1295) Make some Dublin Core items multi-valued

2014-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998861#comment-13998861 ] Tim Allison commented on TIKA-1295: --- Fixed bug that initially made me notice this issue a

Re: Extended fix for TIKA-1169

2014-05-16 Thread Ken Krugler
Hi Matthias, A new issue would be great, as that's what we use to tag changes in SVN. Also a test case (a .jnilib binary that currently fails) would be good. -- Ken On May 15, 2014, at 12:55pm, Matthias Krueger wrote: > I came across some other .jnilib binaries which were detected as .class

Extended fix for TIKA-1169

2014-05-16 Thread Matthias Krueger
I came across some other .jnilib binaries which were detected as .class files and caused issues. It seems there are more Mach-o binary magic variants depending on 32/64 Bit architecture and endianness. Fix is attached. Let me know if I should rather clone the closed TIKA-1169 and attach it th

[jira] [Created] (TIKA-1298) testEmbeddedPDFEmbeddingAnotherDocument fails with PDFBox 1.8.5 and java 1.6

2014-05-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1298: - Summary: testEmbeddedPDFEmbeddingAnotherDocument fails with PDFBox 1.8.5 and java 1.6 Key: TIKA-1298 URL: https://issues.apache.org/jira/browse/TIKA-1298 Project: Tika