Thank you for the reply. I tried to set -Dfile.encoding but didn't resolve the issue. What I can tell now is that euro symbol in DATACOLUMN can be indexed on Solr 3.6 and can not indexed on Solr 4.0. On Solr 3.6 content-type was text/plain, On Solr 4.0 content-type was application/octet-stream. Is this Solr's issue, not database's encoding?
On 2012/11/12, at 20:36, Karl Wright wrote: > It looks like the Postgresql JDBC driver sets the encoding itself, > from what I can find. So I would guess that it is setting the > character encoding based on the database you are connected to. So if > the euro symbol is not handled by the database's encoding, there would > be no way to include it in the query string. I think... > > > Karl > > On Mon, Nov 12, 2012 at 6:22 AM, Karl Wright <daddy...@gmail.com> wrote: >> To clarify, we pass every string to the JDBC driver as a unicode >> string, but it is up to the JDBC driver to decide how to interpret it. >> I don't know what exactly the PostgreSQL 9.1 driver does here. It >> would be interesting to see what is posted to Solr, if you have those >> logs. It may be that it is picking an encoding that is based on your >> machine's default encoding, which would be unfortunate. >> >> This page apparently indicates that there is somehow a way to set the >> encoding that JDBC communicates with the database with: >> >> http://stackoverflow.com/questions/3040597/jdbc-character-encoding >> >> I don't know if this is applicable to us at all though. You can try: >> >> java -Dfile.encoding=utf8 start.jar >> >> ...and see if that changes things - it would be a good hint. >> >> Karl >> >> >> On Mon, Nov 12, 2012 at 6:12 AM, Karl Wright <daddy...@gmail.com> wrote: >>> Hi Abe-san, >>> >>> Quoted strings in SQL queries are not necessarily unicode. See this >>> page for details: >>> >>> http://www.postgresql.org/docs/7.3/static/functions-string.html >>> >>> There is nothing you can do in JDBC invocations to control character >>> set. This must be done in the query itself, or in the database >>> itself. >>> >>> Karl >>> >>> On Mon, Nov 12, 2012 at 6:03 AM, Shinichiro Abe >>> <shinichiro.ab...@gmail.com> wrote: >>>> Hi, >>>> >>>> I'm using Solr 4.0 and JDBC connection via PostgreSQL. >>>> The dataQuery is configured below: >>>> >>>> SELECT idfield AS $(IDCOLUMN), 'http://server?id=' || idfield AS >>>> $(URLCOLUMN), '12345' AS $(DATACOLUMN) FROM album WHERE idfield IN >>>> $(IDLIST) >>>> >>>> On Solr side, '12345' was be able to indexed and stored. >>>> >>>> But when not-ascii character was configured, >>>> >>>> SELECT idfield AS $(IDCOLUMN), 'http://server?id=' || idfield AS >>>> $(URLCOLUMN), '€€€' AS $(DATACOLUMN) FROM album WHERE idfield IN $(IDLIST) >>>> >>>> On Solr side, '€€€' was not indexed and stored. >>>> >>>> Actually, I configure the column which contains not-ascii characters into >>>> DATACOLUMN. >>>> It seems content-type differ between them. >>>> Can JDBC connection control content-type? >>>> >>>> Regards, >>>> Shinichiro Abe >>>>