[jira] Commented: (TIKA-539) Encoding detection is too biased by encoding in meta tag

2010-10-26 Thread Reinhard Schwab (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925288#action_12925288 ] Reinhard Schwab commented on TIKA-539: -- hi ken, in other words: it trusts the server if

[jira] Commented: (TIKA-539) Encoding detection is too biased by encoding in meta tag

2010-10-26 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925180#action_12925180 ] Ken Krugler commented on TIKA-539: -- Hi Reinhard, If I understand the logic you described in

[jira] Updated: (TIKA-539) Encoding detection is too biased by encoding in meta tag

2010-10-26 Thread Reinhard Schwab (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reinhard Schwab updated TIKA-539: - Attachment: TIKA-539_2.patch ignore my first version of the patch. the encoding detection in the pa

[jira] Assigned: (TIKA-539) Encoding detection is too biased by encoding in meta tag

2010-10-26 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-539: Assignee: Ken Krugler > Encoding detection is too biased by encoding in meta tag > --

[jira] Updated: (TIKA-539) Encoding detection is too biased by encoding in meta tag

2010-10-26 Thread Reinhard Schwab (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reinhard Schwab updated TIKA-539: - Attachment: TIKA-539.patch > Encoding detection is too biased by encoding in meta tag > ---

Re: Gearing up for Tika 0.8

2010-10-26 Thread reinhard schwab
Am 26.10.2010 19:03, schrieb Ken Krugler: > > On Oct 21, 2010, at 12:28pm, Jukka Zitting wrote: > >> Hi, >> >> We're planning to release Jackrabbit 2.2 at the end of November, and >> it would be great to have Tika 0.8 out by then for use as a >> dependency. Ideally I'd like to see 0.8 out within th

[jira] Created: (TIKA-539) Encoding detection is too biased by encoding in meta tag

2010-10-26 Thread Reinhard Schwab (JIRA)
Encoding detection is too biased by encoding in meta tag Key: TIKA-539 URL: https://issues.apache.org/jira/browse/TIKA-539 Project: Tika Issue Type: Bug Affects Versions: 0.8

Re: Gearing up for Tika 0.8

2010-10-26 Thread Ken Krugler
On Oct 21, 2010, at 12:28pm, Jukka Zitting wrote: Hi, We're planning to release Jackrabbit 2.2 at the end of November, and it would be great to have Tika 0.8 out by then for use as a dependency. Ideally I'd like to see 0.8 out within the next few weeks. Chris, are you in for another release? I

[jira] Resolved: (TIKA-394) Missing spaces on html parsing

2010-10-26 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler resolved TIKA-394. -- Resolution: Fixed Fix Version/s: 0.8 Committed: http://svn.apache.org/viewvc?view=revision&revisio

[jira] Updated: (TIKA-394) Missing spaces on html parsing

2010-10-26 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-394: - Attachment: TIKA-394.patch > Missing spaces on html parsing > -- > >

[jira] Commented: (TIKA-394) Missing spaces on html parsing

2010-10-26 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925036#action_12925036 ] Ken Krugler commented on TIKA-394: -- Actually XHTMLContentHandler is set up to deal with this

[jira] Closed: (TIKA-532) missing spaces in text extraction of BodyContentHandler

2010-10-26 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler closed TIKA-532. Resolution: Duplicate As per link, this is a duplicate of [TIKA-394]. > missing spaces in text extraction of

ReviewBoard instance

2010-10-26 Thread Mattmann, Chris A (388J)
Hi Guys, Gav from infra@ set up a ReviewBoard instance for Apache [1]. I've never used it before but I thought I'd request an account on it for Tika [2] regardless, so if folks want to use it, they can. Thanks! Cheers, Chris [1] http://s.apache.org/hm [2] https://issues.apache.org/jira/browse/I