Re: Pushing parsers upstream

2011-12-16 Thread Antoni Mylka
W dniu 2011-12-16 20:32, Jukka Zitting pisze: Hi, On Fri, Dec 16, 2011 at 7:45 PM, Antoni Mylka wrote: The moment upstream libraries start depending in tika-core, they stop being upstream libraries and become "side-stream" libraries. Putting POI between core and parsers in the dependency chain

Re: Pushing parsers upstream

2011-12-16 Thread Jukka Zitting
Hi, On Fri, Dec 16, 2011 at 8:04 PM, Antoni Mylka wrote: > I don't want to start new flames and understand that the current status quo > is probably the best possible, given all requirements, yet let's not get > carried away about creating yet another ultimate solution. I was just thinking of st

Re: Pushing parsers upstream

2011-12-16 Thread Jukka Zitting
Hi, On Fri, Dec 16, 2011 at 7:45 PM, Antoni Mylka wrote: > The moment upstream libraries start depending in tika-core, they stop being > upstream libraries and become "side-stream" libraries. Putting POI between > core and parsers in the dependency chain will bring all sorts of issues due > to in

Re: Pushing parsers upstream

2011-12-16 Thread Antoni Mylka
W dniu 2011-12-16 16:12, Jukka Zitting pisze: * Consistency - both or markup and metadata keys will be harder to ensure when it isn't in the same codebase Yep, that can be a problem. I guess the ultimate solution to this would be to come up with a well documented definition of what a parser s

Re: Pushing parsers upstream

2011-12-16 Thread Antoni Mylka
W dniu 2011-12-16 16:12, Jukka Zitting pisze: And who's job would it be to test it? That's a general thing actually, how much testing would need to remain on the Tika side? I'd still have the upstream libraries as dependencies of tika-parsers, and we definitely should continue maintaining a goo

[jira] [Commented] (TIKA-810) Upgrade to PDFbox 1.7.0 as available

2011-12-16 Thread Antoni Mylka (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171098#comment-13171098 ] Antoni Mylka commented on TIKA-810: --- That's a very important question IMHO, crucial to the

[jira] [Issue Comment Edited] (TIKA-810) Upgrade to PDFbox 1.7.0 as available

2011-12-16 Thread Jeremy Anderson (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171040#comment-13171040 ] Jeremy Anderson edited comment on TIKA-810 at 12/16/11 4:50 PM: --

[jira] [Issue Comment Edited] (TIKA-810) Upgrade to PDFbox 1.7.0 as available

2011-12-16 Thread Jeremy Anderson (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171040#comment-13171040 ] Jeremy Anderson edited comment on TIKA-810 at 12/16/11 4:28 PM: --

Re: Pushing parsers upstream

2011-12-16 Thread Jukka Zitting
Hi, On Tue, Dec 13, 2011 at 6:05 PM, Michael McCandless wrote: > It's true users could directly upgrade their PDFBox w/owaiting for a > Tika release but I suspect most users don't do that... Currently people don't do that because it's so easy to break things by upgrading a parser library in sync

Re: Pushing parsers upstream

2011-12-16 Thread Jukka Zitting
Hi, On Tue, Dec 13, 2011 at 12:23 PM, Nick Burch wrote: > A couple of issues do spring to mind with this plan: Good points. > * Metadata keys - if a parser enhancement or new feature needs a new >  metadata key, then you end up having to wait for a new tika release to >  get it (so you can add