Hi All, I'm looking to use Any23 within an XProc pipeline to extract structure from Legislative XML files. I previously wrote to the list about this [0] and now that I am returning to it I'm thinking about writing an XML implementation to join the suite of packages in org.apache.any23.extractor.
Is this best done through Tika e.g. creating SAX events through the XHTML representation returned, then writing the appropriate XML extractor or would it be more appropriate to do as Andy suggested and add a thin XML parser on top of the existing RDF/XML parser implementation we already have? I'm really not sure about the second suggestion and would appreciate any directions at this stage. Thanks Lewis [0] http://www.mail-archive.com/any23-user%40incubator.apache.org/msg00032.html -- Lewis
