Hi, We're testing TIKA-980 (MicrodataContentHandler for Apache Tika) and a lot of URL's work out just fine if microdata is implemented properly. But we're also seeing a lot of webmasters putting meta tags with microdata properties right in the body! They apparently read Google's webmaster page [1] about invisible microdata and went along adding meta tags to the body as if it's normal practice.
Whenever the webmaster has for example: <meta content="EUR" itemprop="priceCurrency"> <span itemprop="price">17.50</span> ..the MicrodataContentHandler trips over it and cannot assign price to an itemscope because the DOM seems to become reordered/normalized, even when i (in a test) properly close the meta tag. What does Tika do to meta tags in the content when using the IdentityHtmlMapper? How can we read the meta tag as if it's just another tag? Is there some switch or setting i've missed? [1]: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146750 Thanks, Markus