Jenkins build is back to stable : Tika-trunk #646

2011-09-23 Thread Apache Jenkins Server
See

Jenkins build is back to stable : Tika-trunk » Apache Tika parsers #646

2011-09-23 Thread Apache Jenkins Server
See

[jira] [Updated] (TIKA-648) Parsing HTML anchors with embedded div faulty

2011-09-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-648: --- Fix Version/s: (was: 0.10) Yep, this probably needs to be addressed in one way or another within Ta

[jira] [Commented] (TIKA-508) HtmlParser link processing should skip usemap and codebase attributes

2011-09-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113799#comment-13113799 ] Jukka Zitting commented on TIKA-508: To fix a failing test case I actually did end up im

Re: Release date of tika 1.0 or 0.10

2011-09-23 Thread Mattmann, Chris A (388J)
Hey Mike, That's fine by me. If you could turn it off and commit before this weekend I'd appreciate it. Cheers, Chris On Sep 23, 2011, at 12:26 PM, Michael McCandless wrote: > I think before we release 0.10 we should address TIKA-712? > > I don't think we should hold the release... I think we

buildbot success in ASF Buildbot on tika-trunk

2011-09-23 Thread buildbot
The Buildbot has detected a restored build on builder tika-trunk while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/tika-trunk/builds/519 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: isis_ubuntu Build Reason: scheduler Build Source Stamp

Re: Jenkins build is still unstable: Tika-trunk #645

2011-09-23 Thread Jukka Zitting
Hi, On Fri, Sep 23, 2011 at 11:18 PM, Jukka Zitting wrote: > Looks like the culprit is my change to the way the attributes > are resolved. I'm just fixing it. Fixed in revision 1175043. BR, Jukka Zitting

Re: Jenkins build is still unstable: Tika-trunk #645

2011-09-23 Thread Jukka Zitting
Hi, On Fri, Sep 23, 2011 at 11:16 PM, Nick Burch wrote: > I'm fairly sure it's not related to my changes, but happy to be corrected if > it is! Looks like the culprit is my change to the way the attributes are resolved. I'm just fixing it. BR, Jukka Zitting

Re: Jenkins build is still unstable: Tika-trunk #645

2011-09-23 Thread Nick Burch
On Fri, 23 Sep 2011, Apache Jenkins Server wrote: See This seems to be a failure in one of the HTML Parser tests: https://builds.apache.org/job/Tika-trunk/645/org.apache.tika$tika-parsers/testReport/org.apache.tika.parser.html/HtmlParserTest/te

Jenkins build is still unstable: Tika-trunk #645

2011-09-23 Thread Apache Jenkins Server
See

Jenkins build is still unstable: Tika-trunk » Apache Tika parsers #645

2011-09-23 Thread Apache Jenkins Server
See

[jira] [Commented] (TIKA-632) Rtf parsing ignores links

2011-09-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113761#comment-13113761 ] Nick Burch commented on TIKA-632: - Now we have our own RTF parser, it may be possible to add

Re: Support for Open Graph meta tags

2011-09-23 Thread Nick Burch
On Fri, 23 Sep 2011, Jukka Zitting wrote: It would be great to get patches from that Mythical Someone who knows RDF Agreed. :-) As Antoni said, this is an area where we could and should be able to do better. There are quite a few RDF experts already at and around Apache, and it shouldn't be too

Re: Release date of tika 1.0 or 0.10

2011-09-23 Thread Michael McCandless
I think before we release 0.10 we should address TIKA-712? I don't think we should hold the release... I think we should just turn off the new functionality (to extract text from master slides) for the time being, until we work out how to fix it more correctly, because right now it's always extrac

Re: indexing FTP documet with solrj

2011-09-23 Thread Otis Gospodnetic
Wrong list, Hadi.  Send it to solr-u...@lucene.apache.org if you are subscribed to it. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message - > From: hadi > To: tika-...@lucene.apache.org > Cc: > Se

Re: Support for Open Graph meta tags

2011-09-23 Thread Jukka Zitting
Hi, On Fri, Sep 23, 2011 at 4:19 PM, Ken Krugler wrote: > From my fairly naive perspective, it seems like one of the challenges > here is that Tika tries to normalize/simplify interacting with data. [...] > Whereas RDF is more focused on precision, in being explicit about > the relationships betw

Re: Support for Open Graph meta tags

2011-09-23 Thread Mattmann, Chris A (388J)
Hey Jukka, This sounds like a good approach. Cheers, Chris On Sep 23, 2011, at 3:24 AM, Jukka Zitting wrote: > Hi, > > On Fri, Sep 23, 2011 at 2:23 AM, Ken Krugler > wrote: >> The reason why is that Open Graph uses RDFa > > Instead of mapping the RDFa tags to Tika's Metadata and then > back

[jira] [Commented] (TIKA-720) EBCDIC encoding not detected

2011-09-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113525#comment-13113525 ] Michael McCandless commented on TIKA-720: - Thanks Nick -- I like this solution (pre-

[jira] [Commented] (TIKA-728) Return RDFa meta tags via Metadata

2011-09-23 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113473#comment-13113473 ] Ken Krugler commented on TIKA-728: -- Jukka said (on the list): {quote} >From the client per

[jira] [Commented] (TIKA-728) Return RDFa meta tags via Metadata

2011-09-23 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113475#comment-13113475 ] Ken Krugler commented on TIKA-728: -- Antoni said (on the list): {quote} There seems to be a

[jira] [Created] (TIKA-728) Return RDFa meta tags via Metadata

2011-09-23 Thread Ken Krugler (JIRA)
Return RDFa meta tags via Metadata -- Key: TIKA-728 URL: https://issues.apache.org/jira/browse/TIKA-728 Project: Tika Issue Type: Improvement Reporter: Ken Krugler Assignee: Ken Krugler

[jira] [Commented] (TIKA-728) Return RDFa meta tags via Metadata

2011-09-23 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113472#comment-13113472 ] Ken Krugler commented on TIKA-728: -- That's what I was afraid of :) My head starts to hurt

[jira] [Commented] (TIKA-728) Return RDFa meta tags via Metadata

2011-09-23 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113471#comment-13113471 ] Ken Krugler commented on TIKA-728: -- Jukka said (on the list): {quote} Instead of mapping t

Re: Support for Open Graph meta tags

2011-09-23 Thread Ken Krugler
On Sep 23, 2011, at 7:00am, Antoni Mylka wrote: > W dniu 2011-09-23 15:12, Jukka Zitting pisze: >>> So I think I'll just patch my local copy to do the Q&D thing, and wait for >>> someone with more XML/RDF-fu to deal with it properly. >> >> Until Someone (TM, :-) does that, I'd be very happy to s

Re: Support for Open Graph meta tags

2011-09-23 Thread Antoni Mylka
W dniu 2011-09-23 15:12, Jukka Zitting pisze: So I think I'll just patch my local copy to do the Q&D thing, and wait for someone with more XML/RDF-fu to deal with it properly. Until Someone (TM, :-) does that, I'd be very happy to see the simple property=xxx mapping you described added to HtmlP

Re: Support for Open Graph meta tags

2011-09-23 Thread Jukka Zitting
Hi, On Fri, Sep 23, 2011 at 3:06 PM, Ken Krugler wrote: > On Sep 23, 2011, at 3:24am, Jukka Zitting wrote: >> In any case it would still be good to mapRDFa tags also to the >> Metadata object. To do that properly (and to open the way to better >> XMP integration, my favourite TODO item :-), we'l

Jenkins build is still unstable: Tika-trunk » Apache Tika parsers #644

2011-09-23 Thread Apache Jenkins Server
See

Jenkins build is still unstable: Tika-trunk #644

2011-09-23 Thread Apache Jenkins Server
See

Re: Support for Open Graph meta tags

2011-09-23 Thread Ken Krugler
On Sep 23, 2011, at 3:24am, Jukka Zitting wrote: > Hi, > > On Fri, Sep 23, 2011 at 2:23 AM, Ken Krugler > wrote: >> The reason why is that Open Graph uses RDFa > > Instead of mapping the RDFa tags to Tika's Metadata and then > back to normal XHTML tags, we might want to consider switching >

[jira] [Commented] (TIKA-720) EBCDIC encoding not detected

2011-09-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113394#comment-13113394 ] Nick Burch commented on TIKA-720: - Turned out not to be too hard to add, even without any ad

Jenkins build is still unstable: Tika-trunk » Apache Tika parsers #643

2011-09-23 Thread Apache Jenkins Server
See

Jenkins build is still unstable: Tika-trunk #643

2011-09-23 Thread Apache Jenkins Server
See

Re: Support for Open Graph meta tags

2011-09-23 Thread Jukka Zitting
Hi, On Fri, Sep 23, 2011 at 2:23 AM, Ken Krugler wrote: > The reason why is that Open Graph uses RDFa Instead of mapping the RDFa tags to Tika's Metadata and then back to normal XHTML tags, we might want to consider switching from plain XHTML to XHTML-with-RDFa as Tika's output format. That s

Re: Support for Open Graph meta tags

2011-09-23 Thread Nick Burch
On Thu, 22 Sep 2011, Ken Krugler wrote: The reason why is that Open Graph uses RDFa Is it worth quickly checking what Any23 does for this kind of thing? (They a hopefully soon-to-be-incubating project that a few people here are helping with, which has some Tika links). If they have a good mod

[jira] [Commented] (TIKA-241) Rar archive support

2011-09-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113266#comment-13113266 ] Jukka Zitting commented on TIKA-241: bq. The changes are done now. Cool! bq. Edmund wa

[jira] [Issue Comment Edited] (TIKA-241) Rar archive support

2011-09-23 Thread Christian Goeller (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113185#comment-13113185 ] Christian Goeller edited comment on TIKA-241 at 9/23/11 8:36 AM: -