[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-19 Thread William Palmer (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001532#comment-14001532 ] William Palmer commented on TIKA-1302: -- This one might be worth a look -

[jira] [Commented] (TIKA-1272) tika-server version is incorrectly defined

2014-05-19 Thread Sergey Beryozkin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001559#comment-14001559 ] Sergey Beryozkin commented on TIKA-1272: Hi Lewis, Is it simply due to the fact

[jira] [Created] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
Hassan Akram created TIKA-1303: -- Summary: Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten Key: TIKA-1303 URL:

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Attachment: HtmlParserTest.java HtmlHandler.java Attached patch fix to html handler

[jira] [Commented] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001593#comment-14001593 ] Hassan Akram commented on TIKA-1303: Please can someone review this submission and

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Affects Version/s: 1.2 1.3 1.4 Parsing Html page (not

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Description: While crawling following web page, we came accross a strange issue where by title for

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Description: While crawling following web page, we came accross a strange issue where by title for

[jira] [Updated] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-05-19 Thread Hassan Akram (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hassan Akram updated TIKA-1303: --- Priority: Minor (was: Major) Parsing Html page (not well formed) containing two title tags results

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-19 Thread Sergey Beryozkin
Hi Chris, On 16/05/14 16:31, Chris Mattmann wrote: Hi Guys, Some thoughts here: -Original Message- From: Nick Burch apa...@gagravarr.org Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Wednesday, May 14, 2014 6:22 AM To: dev@tika.apache.org dev@tika.apache.org Subject: Re:

[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001612#comment-14001612 ] Julien Nioche commented on TIKA-1302: - How large do you want that batch to be? If we

[jira] [Commented] (TIKA-1298) testEmbeddedPDFEmbeddingAnotherDocument fails with PDFBox 1.8.5 and java 1.6

2014-05-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001611#comment-14001611 ] Tim Allison commented on TIKA-1298: --- To be clear, I didn't actually intend blackmail or

[jira] [Created] (TIKA-1304) Implement Metadata Property with PropertyType ALT

2014-05-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1304: - Summary: Implement Metadata Property with PropertyType ALT Key: TIKA-1304 URL: https://issues.apache.org/jira/browse/TIKA-1304 Project: Tika Issue Type:

[jira] [Commented] (TIKA-1295) Make some Dublin Core items multi-valued

2014-05-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001634#comment-14001634 ] Tim Allison commented on TIKA-1295: --- Opened TIKA-1304 to track implementation of ALT

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-19 Thread Nick Burch
On Mon, 19 May 2014, Sergey Beryozkin wrote: I've just looked at the source, unfortunately adding a new Path value will affect the request URIs, UnpackerResource has 2 methods accepting path segments starting from /unpacker and /all. So if we updated then the users would have to modify URIs

[jira] [Commented] (TIKA-1272) tika-server version is incorrectly defined

2014-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001725#comment-14001725 ] Lewis John McGibbney commented on TIKA-1272: We this issue is kinda strange...

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-19 Thread Sergey Beryozkin
Hi, On 19/05/14 13:50, Nick Burch wrote: On Mon, 19 May 2014, Sergey Beryozkin wrote: I've just looked at the source, unfortunately adding a new Path value will affect the request URIs, UnpackerResource has 2 methods accepting path segments starting from /unpacker and /all. So if we updated

[jira] [Commented] (TIKA-1292) Inconsistent priorities in bundled tika-mimetypes.xml

2014-05-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002457#comment-14002457 ] Nick Burch commented on TIKA-1292: -- Thanks for that, I've used it to write a (currently

[jira] [Comment Edited] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002647#comment-14002647 ] Tim Allison edited comment on TIKA-1294 at 5/20/14 12:41 AM: -