Hi,

I think the SAX classes that we've come up in o.a.tika.sax would be
useful also to other projects that don't otherwise depend on Tika, so
I've contacted Apache Commons about the possibility of starting a
"Commons SAX" component to make the code available to a wider
audience. See below for the proposal.

BR,

Jukka Zitting



---------- Forwarded message ----------
From: Jukka Zitting <[email protected]>
Date: Wed, Dec 17, 2008 at 2:09 PM
Subject: Proposal: Commons SAX
To: Jakarta Commons Developers List <[email protected]>


Hi,

In the Apache Tika project [1] we use SAX quite a lot, and have
written a set of quite useful general utility classes for SAX
handling.

For example, in org.apache.tika.sax [2] we have the following:

* ContentHandlerDecorator - Convenient base class for writing
ContentHandler decorators
* EmbeddedContentHandler - Decorator that blocks startDocument() and
endDocument() calls
* TeeContentHandler - Forwards SAX events to multiple handlers
* TextContentHandler - Decorator that blocks everything but character
events (and start/endDocument)
* WriteOutContentHandler - Writes the contents of all character events
to a Writer

In org.apache.tika.sax.xpath [3] we have a simple XPath subset
implementation that supports streaming and filtering of SAX events. In
other words, the implementation doesn't need a DOM tree to evaluate
XPath statements.

I believe this code would be useful also outside Tika, and I was
thinking that it might perhaps make sense to create a Commons project
for this. I also know of some SAX processing classes in Cocoon and
Jackrabbit that could well be of interest to a wider audience.

Do you think something like this would be interesting as a Commons
project? Are there other similar efforts that I should know of? I
looked at XML Commons in xml.apache.org, but it seems pretty dormant.

[1] http://lucene.apache.org/tika/
[2] 
http://lucene.apache.org/tika/apidocs/org/apache/tika/sax/package-summary.html
[3] 
http://lucene.apache.org/tika/apidocs/org/apache/tika/sax/xpath/package-summary.html

BR,

Jukka Zitting

Reply via email to