Hi, On 10/5/07, Christiaan Fluit <[EMAIL PROTECTED]> wrote: > We have recently become aware of the existence of the Tika > project, as several Tika developers/users brought it to our attention.
Likewise, Aperture surfaced on our radar a while ago [1], and it certainly looks interesting! Unless you've already seen it, you should check out the Tika proposal at [2] for more background on where we are coming from and what the goals of the Tika project are. > It seems that you are trying to solve the same problems as we do. This > mail is intended to give you an introduction to Aperture and the areas > in which our projects overlap, so that you know we exist, what we do and > how it relates to Tika. If you are interested, we can also explore > various modes of cooperation. Thanks for the introduction! I agree that we have similar goals and would very much like see how and where we could work together. Tika is still in an early stage of development, so I think we have lots of options available, both technically and organizationally. > As far as we can see, the scope of Aperture is currently broader than > Tika, perhaps you can comment on this. Yes. The scope of the Tika project is relatively tight on purpose, and we're looking at ways to make the code as modular as possible to support reuse in a wide variety of use cases. We want to make it easy to use Tika for example with existing crawlers like in Apache Nutch [3], with content repositories or databases like Apache Jackrabbit [4], or with more advanced content analysis programs like Apache UIMA [5]. > We believe that cooperation is better than competition. We could both > benefit from our combined experience and ideas. We are looking forward > to hear your view on this. Agreed! I see licensing as one major issue to be resolved for enabling better cooperation. I see that your interfaces are licensed under AFL, which seems to be in line with the Apache License, but your implementation classes are under OSL, which makes it impossible for us to directly use your code within an Apache project (see [6]). One concrete thing that I think we could use as a starting point to better understand the different design and licensing constraints would be to implement an ApertureParser in Tika and a TikaExtractor in Aperture. Such "cross-linking" would potentially allow Tika to use the Aperture extractors and Aperture to use Tika parsers, and would perhaps pave the way for more intimate integration in the future. BR, Jukka Zitting [1] http://www.nabble.com/Aperture-tf4009924.html [2] http://wiki.apache.org/incubator/TikaProposal [3] http://lucene.apache.org/nutch/ [3] http://jacrkabbit.apache.org/ [5] http://incubator.apache.org/uima/ [6] http://people.apache.org/~rubys/3party.html
