Thank you for the reply.
I tried to set -Dfile.encoding but didn't resolve the issue.
What I can tell now is that euro symbol in DATACOLUMN 
can be indexed on Solr 3.6 and can not indexed on Solr 4.0.
On Solr 3.6 content-type was text/plain, On Solr 4.0 content-type was 
application/octet-stream.
Is this Solr's issue, not database's encoding?

On 2012/11/12, at 20:36, Karl Wright wrote:

> It looks like the Postgresql JDBC driver sets the encoding itself,
> from what I can find.  So I would guess that it is setting the
> character encoding based on the database you are connected to.  So if
> the euro symbol is not handled by the database's encoding, there would
> be no way to include it in the query string.  I think...
> 
> 
> Karl
> 
> On Mon, Nov 12, 2012 at 6:22 AM, Karl Wright <daddy...@gmail.com> wrote:
>> To clarify, we pass every string to the JDBC driver as a unicode
>> string, but it is up to the JDBC driver to decide how to interpret it.
>> I don't know what exactly the PostgreSQL 9.1 driver does here.  It
>> would be interesting to see what is posted to Solr, if you have those
>> logs.  It may be that it is picking an encoding that is based on your
>> machine's default encoding, which would be unfortunate.
>> 
>> This page apparently indicates that there is somehow a way to set the
>> encoding that JDBC communicates with the database with:
>> 
>> http://stackoverflow.com/questions/3040597/jdbc-character-encoding
>> 
>> I don't know if this is applicable to us at all though.  You can try:
>> 
>> java -Dfile.encoding=utf8 start.jar
>> 
>> ...and see if that changes things - it would be a good hint.
>> 
>> Karl
>> 
>> 
>> On Mon, Nov 12, 2012 at 6:12 AM, Karl Wright <daddy...@gmail.com> wrote:
>>> Hi Abe-san,
>>> 
>>> Quoted strings in SQL queries are not necessarily unicode.  See this
>>> page for details:
>>> 
>>> http://www.postgresql.org/docs/7.3/static/functions-string.html
>>> 
>>> There is nothing you can do in JDBC invocations to control character
>>> set.  This must be done in the query itself, or in the database
>>> itself.
>>> 
>>> Karl
>>> 
>>> On Mon, Nov 12, 2012 at 6:03 AM, Shinichiro Abe
>>> <shinichiro.ab...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> I'm using Solr 4.0 and JDBC connection via PostgreSQL.
>>>> The dataQuery is configured below:
>>>> 
>>>> SELECT idfield AS $(IDCOLUMN), 'http://server?id=' || idfield AS 
>>>> $(URLCOLUMN), '12345' AS $(DATACOLUMN) FROM album WHERE idfield IN 
>>>> $(IDLIST)
>>>> 
>>>> On Solr side, '12345' was be able to indexed and stored.
>>>> 
>>>> But when not-ascii character was configured,
>>>> 
>>>> SELECT idfield AS $(IDCOLUMN), 'http://server?id=' || idfield AS 
>>>> $(URLCOLUMN), '€€€' AS $(DATACOLUMN) FROM album WHERE idfield IN $(IDLIST)
>>>> 
>>>> On Solr side, '€€€' was not indexed and stored.
>>>> 
>>>> Actually, I configure the column which contains not-ascii characters into 
>>>> DATACOLUMN.
>>>> It seems content-type differ between them.
>>>> Can JDBC connection control content-type?
>>>> 
>>>> Regards,
>>>> Shinichiro Abe
>>>> 

Reply via email to