Hi All, I'm trying to index some regional/non-eng html pages with Solr. I thought of indexing the corresponding unicode text for that page as Solr supports Unicode indexing, right? But I'm not able to extract Xml from the html page, because for posting to Solr we require Xml. Can anyone tell me any good method of extracting Xml from html or just let me know how to index non-english html pages with Solr that will enable me searching with unicode queries (for corresponding regional query). Thanks in advance.
--Ahmed.