Hi, I think the SAX classes that we've come up in o.a.tika.sax would be useful also to other projects that don't otherwise depend on Tika, so I've contacted Apache Commons about the possibility of starting a "Commons SAX" component to make the code available to a wider audience. See below for the proposal.
BR, Jukka Zitting ---------- Forwarded message ---------- From: Jukka Zitting <[email protected]> Date: Wed, Dec 17, 2008 at 2:09 PM Subject: Proposal: Commons SAX To: Jakarta Commons Developers List <[email protected]> Hi, In the Apache Tika project [1] we use SAX quite a lot, and have written a set of quite useful general utility classes for SAX handling. For example, in org.apache.tika.sax [2] we have the following: * ContentHandlerDecorator - Convenient base class for writing ContentHandler decorators * EmbeddedContentHandler - Decorator that blocks startDocument() and endDocument() calls * TeeContentHandler - Forwards SAX events to multiple handlers * TextContentHandler - Decorator that blocks everything but character events (and start/endDocument) * WriteOutContentHandler - Writes the contents of all character events to a Writer In org.apache.tika.sax.xpath [3] we have a simple XPath subset implementation that supports streaming and filtering of SAX events. In other words, the implementation doesn't need a DOM tree to evaluate XPath statements. I believe this code would be useful also outside Tika, and I was thinking that it might perhaps make sense to create a Commons project for this. I also know of some SAX processing classes in Cocoon and Jackrabbit that could well be of interest to a wider audience. Do you think something like this would be interesting as a Commons project? Are there other similar efforts that I should know of? I looked at XML Commons in xml.apache.org, but it seems pretty dormant. [1] http://lucene.apache.org/tika/ [2] http://lucene.apache.org/tika/apidocs/org/apache/tika/sax/package-summary.html [3] http://lucene.apache.org/tika/apidocs/org/apache/tika/sax/xpath/package-summary.html BR, Jukka Zitting
