Since it works on 3.6, it is definitely not a JDBC issue. What content-type are you referring to? The Solr connector does not change what content type it posts based on the version of Solr it is posting to, so the content-type you are talking about sounds like something Solr is detecting rather than receiving. Can you confirm?
Karl On Mon, Nov 12, 2012 at 8:40 AM, Shinichiro Abe <shinichiro.ab...@gmail.com> wrote: > Thank you for the reply. > I tried to set -Dfile.encoding but didn't resolve the issue. > What I can tell now is that euro symbol in DATACOLUMN > can be indexed on Solr 3.6 and can not indexed on Solr 4.0. > On Solr 3.6 content-type was text/plain, On Solr 4.0 content-type was > application/octet-stream. > Is this Solr's issue, not database's encoding? > > On 2012/11/12, at 20:36, Karl Wright wrote: > >> It looks like the Postgresql JDBC driver sets the encoding itself, >> from what I can find. So I would guess that it is setting the >> character encoding based on the database you are connected to. So if >> the euro symbol is not handled by the database's encoding, there would >> be no way to include it in the query string. I think... >> >> >> Karl >> >> On Mon, Nov 12, 2012 at 6:22 AM, Karl Wright <daddy...@gmail.com> wrote: >>> To clarify, we pass every string to the JDBC driver as a unicode >>> string, but it is up to the JDBC driver to decide how to interpret it. >>> I don't know what exactly the PostgreSQL 9.1 driver does here. It >>> would be interesting to see what is posted to Solr, if you have those >>> logs. It may be that it is picking an encoding that is based on your >>> machine's default encoding, which would be unfortunate. >>> >>> This page apparently indicates that there is somehow a way to set the >>> encoding that JDBC communicates with the database with: >>> >>> http://stackoverflow.com/questions/3040597/jdbc-character-encoding >>> >>> I don't know if this is applicable to us at all though. You can try: >>> >>> java -Dfile.encoding=utf8 start.jar >>> >>> ...and see if that changes things - it would be a good hint. >>> >>> Karl >>> >>> >>> On Mon, Nov 12, 2012 at 6:12 AM, Karl Wright <daddy...@gmail.com> wrote: >>>> Hi Abe-san, >>>> >>>> Quoted strings in SQL queries are not necessarily unicode. See this >>>> page for details: >>>> >>>> http://www.postgresql.org/docs/7.3/static/functions-string.html >>>> >>>> There is nothing you can do in JDBC invocations to control character >>>> set. This must be done in the query itself, or in the database >>>> itself. >>>> >>>> Karl >>>> >>>> On Mon, Nov 12, 2012 at 6:03 AM, Shinichiro Abe >>>> <shinichiro.ab...@gmail.com> wrote: >>>>> Hi, >>>>> >>>>> I'm using Solr 4.0 and JDBC connection via PostgreSQL. >>>>> The dataQuery is configured below: >>>>> >>>>> SELECT idfield AS $(IDCOLUMN), 'http://server?id=' || idfield AS >>>>> $(URLCOLUMN), '12345' AS $(DATACOLUMN) FROM album WHERE idfield IN >>>>> $(IDLIST) >>>>> >>>>> On Solr side, '12345' was be able to indexed and stored. >>>>> >>>>> But when not-ascii character was configured, >>>>> >>>>> SELECT idfield AS $(IDCOLUMN), 'http://server?id=' || idfield AS >>>>> $(URLCOLUMN), '€€€' AS $(DATACOLUMN) FROM album WHERE idfield IN $(IDLIST) >>>>> >>>>> On Solr side, '€€€' was not indexed and stored. >>>>> >>>>> Actually, I configure the column which contains not-ascii characters into >>>>> DATACOLUMN. >>>>> It seems content-type differ between them. >>>>> Can JDBC connection control content-type? >>>>> >>>>> Regards, >>>>> Shinichiro Abe >>>>> >