Since it works on 3.6, it is definitely not a JDBC issue.

What content-type are you referring to?  The Solr connector does not
change what content type it posts based on the version of Solr it is
posting to, so the content-type you are talking about sounds like
something Solr is detecting rather than receiving.  Can you confirm?

Karl

On Mon, Nov 12, 2012 at 8:40 AM, Shinichiro Abe
<shinichiro.ab...@gmail.com> wrote:
> Thank you for the reply.
> I tried to set -Dfile.encoding but didn't resolve the issue.
> What I can tell now is that euro symbol in DATACOLUMN
> can be indexed on Solr 3.6 and can not indexed on Solr 4.0.
> On Solr 3.6 content-type was text/plain, On Solr 4.0 content-type was
> application/octet-stream.
> Is this Solr's issue, not database's encoding?
>
> On 2012/11/12, at 20:36, Karl Wright wrote:
>
>> It looks like the Postgresql JDBC driver sets the encoding itself,
>> from what I can find.  So I would guess that it is setting the
>> character encoding based on the database you are connected to.  So if
>> the euro symbol is not handled by the database's encoding, there would
>> be no way to include it in the query string.  I think...
>>
>>
>> Karl
>>
>> On Mon, Nov 12, 2012 at 6:22 AM, Karl Wright <daddy...@gmail.com> wrote:
>>> To clarify, we pass every string to the JDBC driver as a unicode
>>> string, but it is up to the JDBC driver to decide how to interpret it.
>>> I don't know what exactly the PostgreSQL 9.1 driver does here.  It
>>> would be interesting to see what is posted to Solr, if you have those
>>> logs.  It may be that it is picking an encoding that is based on your
>>> machine's default encoding, which would be unfortunate.
>>>
>>> This page apparently indicates that there is somehow a way to set the
>>> encoding that JDBC communicates with the database with:
>>>
>>> http://stackoverflow.com/questions/3040597/jdbc-character-encoding
>>>
>>> I don't know if this is applicable to us at all though.  You can try:
>>>
>>> java -Dfile.encoding=utf8 start.jar
>>>
>>> ...and see if that changes things - it would be a good hint.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Nov 12, 2012 at 6:12 AM, Karl Wright <daddy...@gmail.com> wrote:
>>>> Hi Abe-san,
>>>>
>>>> Quoted strings in SQL queries are not necessarily unicode.  See this
>>>> page for details:
>>>>
>>>> http://www.postgresql.org/docs/7.3/static/functions-string.html
>>>>
>>>> There is nothing you can do in JDBC invocations to control character
>>>> set.  This must be done in the query itself, or in the database
>>>> itself.
>>>>
>>>> Karl
>>>>
>>>> On Mon, Nov 12, 2012 at 6:03 AM, Shinichiro Abe
>>>> <shinichiro.ab...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm using Solr 4.0 and JDBC connection via PostgreSQL.
>>>>> The dataQuery is configured below:
>>>>>
>>>>> SELECT idfield AS $(IDCOLUMN), 'http://server?id=' || idfield AS 
>>>>> $(URLCOLUMN), '12345' AS $(DATACOLUMN) FROM album WHERE idfield IN 
>>>>> $(IDLIST)
>>>>>
>>>>> On Solr side, '12345' was be able to indexed and stored.
>>>>>
>>>>> But when not-ascii character was configured,
>>>>>
>>>>> SELECT idfield AS $(IDCOLUMN), 'http://server?id=' || idfield AS 
>>>>> $(URLCOLUMN), '€€€' AS $(DATACOLUMN) FROM album WHERE idfield IN $(IDLIST)
>>>>>
>>>>> On Solr side, '€€€' was not indexed and stored.
>>>>>
>>>>> Actually, I configure the column which contains not-ascii characters into 
>>>>> DATACOLUMN.
>>>>> It seems content-type differ between them.
>>>>> Can JDBC connection control content-type?
>>>>>
>>>>> Regards,
>>>>> Shinichiro Abe
>>>>>
>

Reply via email to