Hi All,

I'm looking to use Any23 within an XProc pipeline to extract structure
from Legislative XML files. I previously wrote to the list about this
[0] and now that I am returning to it I'm thinking about writing an
XML implementation to join the suite of packages in
org.apache.any23.extractor.

Is this best done through Tika e.g. creating SAX events through the
XHTML representation returned, then writing the appropriate XML
extractor or would it be more appropriate to do as Andy suggested and
add a thin XML parser on top of the existing RDF/XML parser
implementation we already have? I'm really not sure about the second
suggestion and would appreciate any directions at this stage.

Thanks

Lewis

[0] http://www.mail-archive.com/any23-user%40incubator.apache.org/msg00032.html

-- 
Lewis

Reply via email to