Hi, On 6/13/07, Philipp Koch <[EMAIL PROTECTED]> wrote:
i am currently also doing meta data extraction from various file formats and got also attracted by the introduction of the tika project. i found a very interesting image meta data extractor library which is shipped under apache license but the project itself is not hosted at apache (see http://www.fightingquaker.com/sanselan/).
Looks nice!
would it make sense to ask the project owner(s) of such projects to move to the apache project, to also make sure that such useful libs will be maintained and development will continue?
It's up to the external project community to decide if they want to become an Apache project. We can of course mention the Incubator and offer to help if they want to bring the project to Apache, but I wouldn't want to go on a crusade to turn all our dependencies into Apache projects. I think the prime criteria on selecting which external libraries to use as default parsers in Tika (a plugin interface should of course allow any other libraries to be used instead of the defaults if needed) would be code quality, licensing, and active maintenance. All of these are typically well handled by Apache projects, but there's no inherent rule that external projects couldn't achieve these criteria just as well or even better than Apache projects. So, once we have our act together (a working codebase and an architectural roadmap) I think we should start contacting various parser projects for cooperation. We should explain what we are trying to do and preferably have for each parser library we depend on someone who is following the mailing lists for both Tika and the parser library in question. While building those bridges we could also mention the chance of bringing external projects into Apache, but that definitely shouldn't be a precondition on cooperation.
ps: don't know if this is the right place for such questions....
Good as any. :-) BR, Jukka Zitting
