[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-19 Thread William Palmer (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001532#comment-14001532 ] William Palmer commented on TIKA-1302: -- This one might be worth a look - https://githu

[jira] [Commented] (TIKA-1272) tika-server version is incorrectly defined

2014-05-19 Thread Sergey Beryozkin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001559#comment-14001559 ] Sergey Beryozkin commented on TIKA-1272: Hi Lewis, Is it simply due to the fact th

[jira] [Created] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
Hassan Akram created TIKA-1303: -- Summary: Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten Key: TIKA-1303 URL: https://issues.apache.org/jira/browse/TIKA-130

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Attachment: HtmlParserTest.java HtmlHandler.java Attached patch fix to html handler an

[jira] [Commented] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001593#comment-14001593 ] Hassan Akram commented on TIKA-1303: Please can someone review this submission and conf

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Affects Version/s: 1.2 1.3 1.4 > Parsing Html page (not

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Description: While crawling following web page, we came accross a strange issue where by title for pa

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Description: While crawling following web page, we came accross a strange issue where by title for pa

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Priority: Minor (was: Major) > Parsing Html page (not well formed) containing two title tags results

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-19 Thread Sergey Beryozkin
Hi Chris, On 16/05/14 16:31, Chris Mattmann wrote: Hi Guys, Some thoughts here: -Original Message- From: Nick Burch Reply-To: "dev@tika.apache.org" Date: Wednesday, May 14, 2014 6:22 AM To: "dev@tika.apache.org" Subject: Re: JAXRS, endpoints and a / welcome page - any ideas why it'

[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001612#comment-14001612 ] Julien Nioche commented on TIKA-1302: - How large do you want that batch to be? If we ar

[jira] [Commented] (TIKA-1298) testEmbeddedPDFEmbeddingAnotherDocument fails with PDFBox 1.8.5 and java 1.6

2014-05-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001611#comment-14001611 ] Tim Allison commented on TIKA-1298: --- To be clear, I didn't actually intend blackmail or t

[jira] [Created] (TIKA-1304) Implement Metadata Property with PropertyType ALT

2014-05-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1304: - Summary: Implement Metadata Property with PropertyType ALT Key: TIKA-1304 URL: https://issues.apache.org/jira/browse/TIKA-1304 Project: Tika Issue Type: Improvemen

[jira] [Commented] (TIKA-1295) Make some Dublin Core items multi-valued

2014-05-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001634#comment-14001634 ] Tim Allison commented on TIKA-1295: --- Opened TIKA-1304 to track implementation of ALT > M

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-19 Thread Nick Burch
On Mon, 19 May 2014, Sergey Beryozkin wrote: I've just looked at the source, unfortunately adding a new Path value will affect the request URIs, UnpackerResource has 2 methods accepting path segments starting from "/unpacker" and "/all". So if we updated then the users would have to modify URI

[jira] [Commented] (TIKA-1272) tika-server version is incorrectly defined

2014-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001725#comment-14001725 ] Lewis John McGibbney commented on TIKA-1272: We this issue is kinda strange...

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-19 Thread Sergey Beryozkin
Hi, On 19/05/14 13:50, Nick Burch wrote: On Mon, 19 May 2014, Sergey Beryozkin wrote: I've just looked at the source, unfortunately adding a new Path value will affect the request URIs, UnpackerResource has 2 methods accepting path segments starting from "/unpacker" and "/all". So if we updated

[jira] [Created] (TIKA-1305) New list processing changes appear to be causing RTFParser exception

2014-05-19 Thread Chris Bamford (JIRA)
Chris Bamford created TIKA-1305: --- Summary: New list processing changes appear to be causing RTFParser exception Key: TIKA-1305 URL: https://issues.apache.org/jira/browse/TIKA-1305 Project: Tika

[jira] [Updated] (TIKA-1305) New list processing changes appear to be causing RTFParser exception

2014-05-19 Thread Chris Bamford (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Bamford updated TIKA-1305: Attachment: rtfparsererror_2.rtf Attached RTF file (rtfparsererror_2.rtf) which causes the exception

[jira] [Commented] (TIKA-1298) testEmbeddedPDFEmbeddingAnotherDocument fails with PDFBox 1.8.5 and java 1.6

2014-05-19 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001901#comment-14001901 ] Tilman Hausherr commented on TIKA-1298: --- No problem, I know it was a joke and thought

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-19 Thread Nick Burch
On Mon, 19 May 2014, Sergey Beryozkin wrote: I think it might be good to push them into a common path prefix. Though /unpack/unpacker seems a bit unwieldy... If we do introduce "/unpack" then may be we can drop "/unpacker", and have two methods with "/" & "/all", so users will work with "/unpa

[jira] [Commented] (TIKA-1292) Inconsistent priorities in bundled tika-mimetypes.xml

2014-05-19 Thread Jason Dillon (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002414#comment-14002414 ] Jason Dillon commented on TIKA-1292: This is a jar file available on maven-central whic

[jira] [Commented] (TIKA-1292) Inconsistent priorities in bundled tika-mimetypes.xml

2014-05-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002457#comment-14002457 ] Nick Burch commented on TIKA-1292: -- Thanks for that, I've used it to write a (currently di

[jira] [Commented] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002647#comment-14002647 ] Tim Allison commented on TIKA-1294: --- As very preliminary work towards TIKA-1302, I ran Ti

[jira] [Comment Edited] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002647#comment-14002647 ] Tim Allison edited comment on TIKA-1294 at 5/20/14 12:41 AM: - A