buildbot failure in ASF Buildbot on tika-trunk

2012-07-01 Thread buildbot
The Buildbot has detected a new failure on builder tika-trunk while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/tika-trunk/builds/887 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: portunus_ubuntu Build Reason: scheduler Build Source Stam

JAX-RS overhead in tika-server

2012-07-01 Thread Jukka Zitting
Hi, I looked at tika-server in a bit more detail, and I'm a bit concerned about the dependency overhead it needs for the JAX-RS support: +- org.apache.cxf:cxf-rt-frontend-jaxrs:jar:2.5.2 +- org.apache.cxf:cxf-common-utilities:jar:2.5.2 | +- org.apache.ws.xmlschema:xmlschema-core:jar:

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
Hey Jukka, On Jul 1, 2012, at 5:09 AM, Jukka Zitting wrote: > Hi, > > I looked at tika-server in a bit more detail, and I'm a bit concerned > about the dependency overhead it needs for the JAX-RS support: > > +- org.apache.cxf:cxf-rt-frontend-jaxrs:jar:2.5.2 > +- org.apache.cxf:cxf-common-

Re: svn commit: r1355877 - in /tika/trunk: ./ tika-dll/ tika-dll/src/ tika-dll/src/main/ tika-dll/src/main/csharp/ tika-dll/src/main/csharp/Apache/

2012-07-01 Thread Mattmann, Chris A (388J)
WOW nice Jukka, you did it! Cheers, Chris On Jul 1, 2012, at 6:04 AM, wrote: > Author: jukka > Date: Sun Jul 1 13:04:00 2012 > New Revision: 1355877 > > URL: http://svn.apache.org/viewvc?rev=1355877&view=rev > Log: > TIKA-773: .NET version of Tika > > Add a basic Tika.dll build > > Added:

[jira] [Created] (TIKA-944) Extend tika-server API to be consistent with tika-app CLI

2012-07-01 Thread Jason Judge (JIRA)
Jason Judge created TIKA-944: Summary: Extend tika-server API to be consistent with tika-app CLI Key: TIKA-944 URL: https://issues.apache.org/jira/browse/TIKA-944 Project: Tika Issue Type: New Fe

[jira] [Commented] (TIKA-944) Extend tika-server API to be consistent with tika-app CLI

2012-07-01 Thread Jason Judge (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404765#comment-13404765 ] Jason Judge commented on TIKA-944: -- I'm not a java developer, and don't have a build enviro

Re: svn commit: r1355947 - /tika/trunk/tika-parent/pom.xml

2012-07-01 Thread Mattmann, Chris A (388J)
Great job Ray!! Cheers, Chris On Jul 1, 2012, at 9:39 AM, wrote: > Author: rgauss > Date: Sun Jul 1 16:39:29 2012 > New Revision: 1355947 > ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pas

[jira] [Commented] (TIKA-944) Extend tika-server API to be consistent with tika-app CLI

2012-07-01 Thread Jason Judge (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404769#comment-13404769 ] Jason Judge commented on TIKA-944: -- I'm not sure if you need more specific details, such as

[jira] [Comment Edited] (TIKA-944) Extend tika-server API to be consistent with tika-app CLI

2012-07-01 Thread Jason Judge (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404769#comment-13404769 ] Jason Judge edited comment on TIKA-944 at 7/1/12 5:01 PM: -- I'm not

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Jukka Zitting
Hi, On Sun, Jul 1, 2012 at 6:27 PM, Mattmann, Chris A (388J) wrote: > On Jul 1, 2012, at 5:09 AM, Jukka Zitting wrote: > Sergey Beryozkin (who I'm CC'ing on this email since I'm not sure > he's subscribed to dev@) helped by providing guidance on the CXF > side while I was working on this with Max

[jira] [Updated] (TIKA-872) Tika --extract fails for RTF

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-872: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Tika --ex

[jira] [Updated] (TIKA-872) Tika --extract fails for RTF

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-872: --- - push to 1.3 > Tika --extract fails for RTF > > >

[jira] [Updated] (TIKA-774) ExifTool Parser

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-774: --- - push to 1.3 > ExifTool Parser > --- > > Key: TIKA-77

[jira] [Updated] (TIKA-774) ExifTool Parser

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-774: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > ExifTool

[jira] [Updated] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-715: --- - push to 1.3 > Some parsers produce non-well-formed XHTML SAX events > --

[jira] [Updated] (TIKA-906) Headers, footers, and footnotes not extracted from Pages documents

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-906: --- - push to 1.3 > Headers, footers, and footnotes not extracted from Pages documents

[jira] [Updated] (TIKA-906) Headers, footers, and footnotes not extracted from Pages documents

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-906: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Headers,

[jira] [Updated] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-715: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Some pars

[jira] [Updated] (TIKA-757) Address TODOs when we upgrade to next POI release (3.8 beta 5)

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-757: --- - push to 1.3 > Address TODOs when we upgrade to next POI release (3.8 beta 5) > -

[jira] [Updated] (TIKA-868) TXT parser does not honour the specified encoding

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-868: --- - push to 1.3 > TXT parser does not honour the specified encoding > --

[jira] [Updated] (TIKA-868) TXT parser does not honour the specified encoding

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-868: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > TXT parse

[jira] [Updated] (TIKA-757) Address TODOs when we upgrade to next POI release (3.8 beta 5)

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-757: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Address T

[jira] [Updated] (TIKA-891) Use POST in addition to PUT on method calls in tika-server

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-891: --- - push to 1.3 > Use POST in addition to PUT on method calls in tika-server > -

[jira] [Updated] (TIKA-539) Encoding detection is too biased by encoding in meta tag

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-539: --- - push to 1.3 > Encoding detection is too biased by encoding in meta tag > ---

[jira] [Updated] (TIKA-817) (PPT/PPTX) Missing date/time in text content.

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-817: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > (PPT/PPTX

[jira] [Updated] (TIKA-817) (PPT/PPTX) Missing date/time in text content.

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-817: --- - push to 1.3 > (PPT/PPTX) Missing date/time in text content. > --

[jira] [Updated] (TIKA-891) Use POST in addition to PUT on method calls in tika-server

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-891: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Use POST

[jira] [Updated] (TIKA-775) Embed Capabilities

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-775: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Embed Cap

[jira] [Updated] (TIKA-775) Embed Capabilities

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-775: --- - push to 1.3 > Embed Capabilities > -- > > Key: T

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-820: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Locator i

[jira] [Updated] (TIKA-605) Tika GDAL parser

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-605: --- - push to 1.3 > Tika GDAL parser > > > Key: TIKA-

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-820: --- - push to 1.3 > Locator is unset for HTML parser > ---

[jira] [Updated] (TIKA-605) Tika GDAL parser

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-605: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Tika GDAL

[jira] [Updated] (TIKA-539) Encoding detection is too biased by encoding in meta tag

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-539: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Encoding

[jira] [Updated] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-754: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Automatic

[jira] [Updated] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-819: --- - push to 1.3 > Make Option to Exclude Embedded Files' Text for Text Content > ---

[jira] [Updated] (TIKA-776) ExifTool Embedder

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-776: --- - push to 1.3 > ExifTool Embedder > - > > Key: TIK

[jira] [Updated] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-819: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > Make Opti

[jira] [Updated] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-754: --- - push to 1.3 > Automatic line break insertion (BR element) instead of '\n' in >

[jira] [Updated] (TIKA-776) ExifTool Embedder

2012-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-776: --- Fix Version/s: (was: 1.2) 1.3 - push to 1.3 > ExifTool

[jira] [Created] (TIKA-945) Upgrade tika-server to CXF 2.6.1

2012-07-01 Thread Chris A. Mattmann (JIRA)
Chris A. Mattmann created TIKA-945: -- Summary: Upgrade tika-server to CXF 2.6.1 Key: TIKA-945 URL: https://issues.apache.org/jira/browse/TIKA-945 Project: Tika Issue Type: Bug Comp

[jira] [Commented] (TIKA-758) Address TODOs when we upgrade to next PDFBox release

2012-07-01 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404833#comment-13404833 ] Michael McCandless commented on TIKA-758: - Looks like the TODOs are all in PDF2XHTML

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
Hey Jukka, On Jul 1, 2012, at 12:01 PM, Jukka Zitting wrote: > Hi, > > On Sun, Jul 1, 2012 at 6:27 PM, Mattmann, Chris A (388J) > wrote: >> On Jul 1, 2012, at 5:09 AM, Jukka Zitting wrote: >> Sergey Beryozkin (who I'm CC'ing on this email since I'm not sure >> he's subscribed to dev@) helped by

[jira] [Reopened] (TIKA-758) Address TODOs when we upgrade to next PDFBox release

2012-07-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting reopened TIKA-758: OK, thanks! Reopening until those ones are addressed. > Address TODOs when we upgrade to

[jira] [Resolved] (TIKA-757) Address TODOs when we upgrade to next POI release (3.8 beta 5)

2012-07-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-757. - Resolution: Fixed Fix Version/s: (was: 1.3) 1.2 I'm going to mark this as clo

[jira] [Created] (TIKA-946) Improve how the PPTX parser uses XLSF from POI

2012-07-01 Thread Nick Burch (JIRA)
Nick Burch created TIKA-946: --- Summary: Improve how the PPTX parser uses XLSF from POI Key: TIKA-946 URL: https://issues.apache.org/jira/browse/TIKA-946 Project: Tika Issue Type: Bug Compo

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Nick Burch
On Sun, 1 Jul 2012, Mattmann, Chris A (388J) wrote: I also plan to spin a 1.2 release candidate at some point in the next week or so. I realize the metadata stuff isn't done yet, but it's better to simply release early and often, and then when stuff is ready it can be included. Releasing is a l

[jira] [Resolved] (TIKA-513) Support of Deja Vu (DjVu) format

2012-07-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-513. Resolution: Won't Fix Resolving as Won't Fix until there's an upstream library that [we can use|http

Build failed in Jenkins: Tika-trunk #887

2012-07-01 Thread Apache Jenkins Server
See -- Failed to access build log hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/Tika-trunk at hudson.remoting.Channel@36245edd:ubuntu1 at hudson.FilePa

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
Hey Nick, On Jul 1, 2012, at 2:52 PM, Nick Burch wrote: > On Sun, 1 Jul 2012, Mattmann, Chris A (388J) wrote: >> I also plan to spin a 1.2 release candidate at some point in the next week >> or so. I realize the metadata stuff isn't done yet, but it's better to >> simply release early and often

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Nick Burch
On Sun, 1 Jul 2012, Mattmann, Chris A (388J) wrote: It can be a big pain if an in-progress API is suddenly effectively frozen by the need to be compatible into the future... Agreed -- so, what do you think? How much longer do we need to wrap up the API changes or whatever going on with the met

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Ray Gauss II
Is there any consensus on TIKA-930 [1]? I don't want to wipe out properties that others feel are critical or include non-ratified standards if that's outside of policy. If there's no objection to what's outlined in that issue I can commit those changes tomorrow morning (GMT -4). Regards, Ray

[jira] [Commented] (TIKA-930) Consolidation of Some Tika Core Properties

2012-07-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404869#comment-13404869 ] Nick Burch commented on TIKA-930: - In terms of dcmi, we tend to take a pragmatic view on met

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
Hi Ray, I'd say what you proposed is fine by me. Doesn't seem to have generated objections so please move forward. Cheers, Chris On Jul 1, 2012, at 4:56 PM, Ray Gauss II wrote: > Is there any consensus on TIKA-930 [1]? I don't want to wipe out properties > that others feel are critical or inc