Introducing the Aperture project

Christiaan Fluit Fri, 05 Oct 2007 05:32:41 -0700

Hello Tika developers (cc Aperture list),

My name is Christiaan Fluit, I am one of the admins of the Apertureproject. We have recently become aware of the existence of the Tikaproject, as several Tika developers/users brought it to our attention.It seems that you are trying to solve the same problems as we do. Thismail is intended to give you an introduction to Aperture and the areasin which our projects overlap, so that you know we exist, what we do andhow it relates to Tika. If you are interested, we can also explorevarious modes of cooperation.

Aperture is a Java framework for extracting and querying full-textcontent and metadata from various information systems (e.g. filesystems, web sites, mail boxes) and the file formats (e.g. documents,images) occurring in these systems.

You could have a look at the homepage [1], sourceforge page [2] and Wiki[3].

The project started two years ago, when two organizations (DFKI, aGerman research institute [4] and Aduna, a Dutch software firm [5, 6])recognized they had a common need for a data extraction framework. Thecore requirements were to crawl various data sources and to extract datafrom the objects that occur in these sources, using RDF [7] as a meansto communicate and store information throughout this framework.

Since its inception the project benefited from contributions fromdevelopers affiliated with both founding partners as well as a group ofexternal open source enthusiasts. It has been successfully embedded invarious software projects - see [8] for details.

Aperture comes from the Semantic Web community. We firmly believe thatstoring data in RDF triples, making it conform to a well-defined modeland making it searchable (Lucene) and structurally queryable (SPARQL)),allows for very powerful applications. Many aspects of data integrationthat plague the users of relational databases or XML schemas becomemanageable or non-existent when using RDF technologies.

As far as we can see, the scope of Aperture is currently broader thanTika, perhaps you can comment on this. We provide seven kinds ofservices. Two of them have direct equivalents in Tika, namely Extractor(processes a stream to extract text and metadata) and MimeTypeIdentifier(determines the MIME type of a stream using heuristics such as magicnumbers, strings, file extensions, etc.). The Aperture Extractorscorrespond directly to Tika Parsers, see e.g. [11] and [12]. As for MIMEtype identification, please compare [9] and [10]. The Documentation pageon the Wiki [13] can also provide you with more details.

We believe that cooperation is better than competition. We could bothbenefit from our combined experience and ideas. We are looking forwardto hear your view on this.

This mail has been sent to both the tika-dev and aperture-devel list sothat both communities are kept informed of the progress of this discussion.



Kind regards,

Christiaan Fluit,
Leo Sauermann,
Antoni Mylka,

Admins of the Aperture Project.


[1] http://aperture.sourceforge.net
[2] http://sf.net/projects/aperture
[3] http://aperture.wiki.sourceforge.net
[4] http://www.dfki.de
[5] http://www.aduna-software.com
[6] http://www.openrdf.org
[7] http://www.w3.org/RDF/
[8] http://aperture.wiki.sourceforge.net/ProjectsUsingAperture

[9]http://aperture.svn.sourceforge.net/viewvc/aperture/trunk/aperture/src/java/org/semanticdesktop/aperture/mime/identifier/magic/[10]http://svn.apache.org/viewvc/incubator/tika/trunk/src/main/java/org/apache/tika/mime/[11]http://aperture.svn.sourceforge.net/viewvc/aperture/trunk/aperture/src/java/org/semanticdesktop/aperture/extractor/[12]http://svn.apache.org/viewvc/incubator/tika/trunk/src/main/java/org/apache/tika/parser/

[13] http://aperture.wiki.sourceforge.net/Documentation


--
[EMAIL PROTECTED]

Aduna
Prinses Julianaplein 14-b
3817 CS Amersfoort
The Netherlands

+31 33 465 9987 phone
+31 33 465 9987 fax

http://www.aduna-software.com

Introducing the Aperture project

Reply via email to