Build failed in Jenkins: Tika-trunk #914

2012-08-13 Thread Apache Jenkins Server
See Changes: [kkrugler] TIKA-771: "Hello, World!" in UTF-8/ASCII gets detected as IBM500 Added test to confirm that it was fixed by Jukka's previous changes to the charset detection & CONTENT_TYPE handling code.

[jira] [Commented] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)

2012-08-13 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433365#comment-13433365 ] Ken Krugler commented on TIKA-961: -- Hi Markus, See HtmlParserTest.testBoilerplateRemoval()

[jira] [Resolved] (TIKA-771) "Hello, World!" in UTF-8/ASCII gets detected as IBM500

2012-08-13 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler resolved TIKA-771. -- Resolution: Fixed Fix Version/s: 1.3 Previously fixed by Jukka's changes. Added test in r1372530 t

Re: TIKA-431 and CONTENT_ENCODING

2012-08-13 Thread Ken Krugler
On Aug 9, 2012, at 5:44pm, Jukka Zitting wrote: > Hi, > > On Thu, Aug 9, 2012 at 10:56 PM, Ken Krugler > wrote: >> You made a note in Changes.txt that this was deprecated, so I'm assuming >> that you >> think we should hold off on fixing the abuse of CONTENT_ENCODING until after >> the >> 1.2

[jira] [Created] (TIKA-974) No longer return charset info in Metadata's CONTENT_ENCODING

2012-08-13 Thread Ken Krugler (JIRA)
Ken Krugler created TIKA-974: Summary: No longer return charset info in Metadata's CONTENT_ENCODING Key: TIKA-974 URL: https://issues.apache.org/jira/browse/TIKA-974 Project: Tika Issue Type: Bu

[jira] [Commented] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)

2012-08-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1343#comment-1343 ] Markus Jelsma commented on TIKA-961: Ken, I'll see if i can provide a test but i'd idea

[jira] [Assigned] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)

2012-08-13 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-961: Assignee: Ken Krugler > No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true) >

[jira] [Commented] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)

2012-08-13 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433324#comment-13433324 ] Ken Krugler commented on TIKA-961: -- Hi Markus - thanks for the patch. It would be great if

[jira] [Commented] (TIKA-771) "Hello, World!" in UTF-8/ASCII gets detected as IBM500

2012-08-13 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433290#comment-13433290 ] Ken Krugler commented on TIKA-771: -- I added a test case for this, and in trunk it seems to

[jira] [Closed] (TIKA-868) TXT parser does not honour the specified encoding

2012-08-13 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler closed TIKA-868. Resolution: Duplicate Assignee: Ken Krugler > TXT parser does not honour the specified encoding > -

[jira] [Commented] (TIKA-868) TXT parser does not honour the specified encoding

2012-08-13 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433282#comment-13433282 ] Ken Krugler commented on TIKA-868: -- Hi Daniel - using the latest Tika (trunk) I get back UT

[jira] [Commented] (TIKA-792) NoSuchMethodException "CTMarkupImpl.(org.apache.xmlbeans.SchemaType, boolean)" processing a OOXML document

2012-08-13 Thread Eric Pascal (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433145#comment-13433145 ] Eric Pascal commented on TIKA-792: -- Problem still there for me in version 3.8 final of POI

How can I let Tika know the resource name?

2012-08-13 Thread 122jxgcn
Hello, I'm using Solr's ExtractingRequestHandler to let Tika know the name of the file when indexing. I'm currently sending HTTP request something like /update/extract?stream.file=#{filepath}&literal.id=#{filepath}&resource.name=#{resource_name}&commit=true Will setting the resource.name variabl