Hi,

We're testing TIKA-980 (MicrodataContentHandler for Apache Tika) and a lot of 
URL's work out just fine if microdata is implemented properly.  But we're also 
seeing a lot of webmasters putting meta tags with microdata properties right in 
the body! They apparently read Google's webmaster page [1] about invisible 
microdata and went along adding meta tags to the body as if it's normal 
practice.

Whenever the webmaster has for example:

                <meta content="EUR" itemprop="priceCurrency">
                <span itemprop="price">17.50</span>

..the MicrodataContentHandler trips over it and cannot assign price to an 
itemscope because the DOM seems to become reordered/normalized,  even when i 
(in a test) properly close the meta tag. What does Tika do to meta tags in the 
content when using the IdentityHtmlMapper? How can we read the meta tag as if 
it's just another tag? Is there some switch or setting i've missed?

[1]: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146750

Thanks,
Markus

Reply via email to