Muhammed Sameer schrieb:
We run post.jar periodically ie after every 15mins to commit the changes, Is this approach correct ?
Sounds reasonable to me.
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported
That's just to remind you not to try and post documents in another encoding. This seems to be a limitation of the SimplePostTool, not of Solr. I guess the reason is that in order for Solr to work quickly and reliably, it relies on the Content-Type of the request to determine the encoding. If, for example, you send XML encoded in ISO-8859-1, you have to specify that in two places: * XML declaration: <?xml version="1.0" encoding="ISO-8859-1"?> * HTTP header: Content-Type: text/xml; charset=ISO-8859-1 The SimplePostTool, however, being just what the name says, may not bother to read the encoding from the document and bring the HTTP content type header in line. Instead, it explicitly requests UTF-8, probably in the interest of simplicity. Well, that's just my theory. Can anyone confirm?
So I tried to run the test_utf8.sh script and got the following output {code} Solr server is up. HTTP GET is accepting UTF-8 HTTP POST is accepting UTF-8 HTTP POST defaults to UTF-8 ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic multilingual plane {code} Are these errors normal or do I need to change something ?
I'm seeing the same output, don't worry, just some tests. It is possible to have Solr index documents containing characters outside of the BMP (Basic Multilingual Plane), which can be verified posting something like this: <add> <doc> <field name="id">1001</field> <field name="title">BMP plus 1 𐀀</field> </doc> </add> Maybe the test script output says that such characters cannot be used for querying. Hardly relevant if you consider that the BMP comprises even languages such as Telugu, Bopomofo and French. Best, Michael Ludwig