Hi Andrea, If you know that you're only parsing XHTML, and you want to use XPath expressions to extract specific elements, then I'm not sure Tika buys you much.
In those situations we typically just use TagSoup (or JSoup or NekoHTML) to first clean up the document, then run it through some DOM parser. -- Ken > From: Andrea Asta > Sent: June 3, 2015 1:08:29am PDT > To: user@tika.apache.org > Subject: XPath support for attributes > > Hi, > I've created a parser which extracts some sections from HTML documents. > I'm producing a new form of XHTML for the content handler, with for example > <section id = "myID">SECTION CONTENT</section>. Which is the proper content > handler for having all the sections? Will the XPath implementation for Tika > support expressions with @ATTRIBUTE and [@ATTRIBUTE='value'] ? > > Thank you > Andrea -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr