[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871544#action_12871544
]
Julien Nioche commented on TIKA-433:
You can do that with [Behemoth|http://code.google.co
[
https://issues.apache.org/jira/browse/TIKA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871567#action_12871567
]
Jukka Zitting commented on TIKA-431:
Agreed, we should be using the charset parameter of
[
https://issues.apache.org/jira/browse/TIKA-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871568#action_12871568
]
Jukka Zitting commented on TIKA-430:
Sounds reasonable, especially since unlike extra con
[
https://issues.apache.org/jira/browse/TIKA-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871570#action_12871570
]
Jukka Zitting commented on TIKA-429:
Looks like the input document is incorrectly treated
[
https://issues.apache.org/jira/browse/TIKA-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871585#action_12871585
]
Julien Nioche commented on TIKA-430:
The method mapSafeAttribute(String elementName, Stri
[
https://issues.apache.org/jira/browse/TIKA-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-425.
Assignee: Jukka Zitting
Fix Version/s: 0.8
Resolution: Fixed
Thanks for the problem r
[
https://issues.apache.org/jira/browse/TIKA-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-428.
Assignee: Jukka Zitting
Resolution: Duplicate
Yes, this is a duplicate of TIKA-418.
> Unexpect
[
https://issues.apache.org/jira/browse/TIKA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871594#action_12871594
]
Jukka Zitting commented on TIKA-418:
See the duplicate issue TIKA-428 for a stack trace o
[
https://issues.apache.org/jira/browse/TIKA-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871608#action_12871608
]
Jukka Zitting commented on TIKA-420:
Agreed with Ken about using XHTML SAX events instead
[
https://issues.apache.org/jira/browse/TIKA-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871611#action_12871611
]
Jukka Zitting commented on TIKA-427:
The type detection code in Tika gets confused by the
[
https://issues.apache.org/jira/browse/TIKA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-424.
Assignee: Jukka Zitting
Fix Version/s: 0.8
Resolution: Fixed
Thanks! Patch committed
[
https://issues.apache.org/jira/browse/TIKA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871614#action_12871614
]
Jukka Zitting commented on TIKA-418:
Re: mp3 problem, in fact it was already filed separa
[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871616#action_12871616
]
Grant Ingersoll commented on TIKA-433:
--
Does that mean you are going to extract it from
[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871623#action_12871623
]
Julien Nioche commented on TIKA-433:
Could do. I can't see a place in Tika's code for non
[
https://issues.apache.org/jira/browse/TIKA-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-413.
Assignee: Jukka Zitting
Fix Version/s: 0.8
Resolution: Fixed
Good stuff! I committed
[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871720#action_12871720
]
Grant Ingersoll commented on TIKA-433:
--
I think it makes sense as a Tika contrib, but th
[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871726#action_12871726
]
Jukka Zitting commented on TIKA-433:
We could easily add a separate tika-hadoop component
[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871742#action_12871742
]
Yonik Seeley commented on TIKA-433:
---
>From the peanut gallery, Lucene has gone down the con
Hey Ken,
I wanted to get back to you on this:
>
> 1. Ability to allow all attributes through from HTML documents
>
> TIKA-379, building on TIKA-347, allows both more relaxed passing of
> attributes, as well as letting all elements through.
>
> So if somebody wants to get the "lang" attribute f
Hi,
On Wed, May 26, 2010 at 3:49 PM, Mattmann, Chris A (388J)
wrote:
> I'm worried that we're mixing concerns here. Some of the information that
> you cite above sounds more to me like metadata (and in fact, thinking about
> it, you could argue that attributes themselves on the XHTML amount that
Hey Jukka,
So you're seeing the delineation more as:
* metadata = document level stuff
* XHTML = textual representation [which can included finer-grained what I
would call "metadata" too]
?
If so, interesting, I wonder then if there should be some sort of rethinking
then of the way tha
[
https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871767#action_12871767
]
Jukka Zitting commented on TIKA-402:
Latest patch committed in revision 948452, thanks!
Hi,
On Wed, May 26, 2010 at 5:10 PM, Mattmann, Chris A (388J)
wrote:
> If so, interesting, I wonder then if there should be some sort of rethinking
> then
> of the way that we capture or represent the XHTML because I would think that
> our existing Metadata object could be reused at that level t
[
https://issues.apache.org/jira/browse/TIKA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-431:
Assignee: Ken Krugler
> Tika currently misuses the HTTP Content-Encoding header, and does not seem to
[
https://issues.apache.org/jira/browse/TIKA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871824#action_12871824
]
Ken Krugler commented on TIKA-431:
--
I should have some time soon to do a once-over on a bunc
[
https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871848#action_12871848
]
Martijn van Groningen commented on TIKA-402:
Oops... next patch will have 4 space
26 matches
Mail list logo