Hi Raymond, I agree with you, 0xfffe is a special character, that is why I was asking how it's handled in solr. In my document, 0xfffe does not appear at the beginning, it's in the content.
Just an update about testing I'm doing: in a SolrCloud two shards environment, if I launch dataimport on one node of the shard that will be target for that doc, all the docs got written properly; if I launch dataimport on one node of the other shard and then it forwards to the target, I get the error. Thanks Federico 2013/8/5 Raymond Wiker <rwi...@gmail.com> > I think #xfffe is special; it is used as a "byte order mark" to identify > the encoding used. In that case, it should only appear at the beginning of > the document. > > Sent from my iPhone > > On 5 Aug 2013, at 17:19, Federico Chiacchiaretta <federico.c...@gmail.com> > wrote: > > > Hi Shawn, > > thanks for your answer. > > From the docs you linked i found: > > "This property is only relevent for server versions less than or equal to > > 7.2". > > > > I'm using version 9.1, I gave it a try but unfortunately I had no luck. > > Besides, I checked encoding settings on DB and it's UTF-8. > > > > Please note that import of data works with a single instance of Solr, but > > it doesn't on a SolrCloud when the update gets forwarded to another node. > > Thinking about jetty bug (or misconfiguration), I also tried a test > > environment based on tomcat, but I have the same result. > > > > How utf character 0xfffe is supposed to be handled? It seems that solr > can > > handle it well, while sending it over HTTP to another node breaks things. > > Can it be a HttpSolrServer bug? > > > > Thanks, > > Federico > > > > > > > > > > 2013/8/5 Shawn Heisey <s...@elyograg.org> > > > >> On 8/1/2013 7:20 AM, Federico Chiacchiaretta wrote: > >>> on data import from a PostgreSQL db, I get the following error in > >> solr.log: > >>> > >>> ERROR - 2013-08-01 09:51:00.217; org.apache.solr.common.SolrException; > >>> shard update error RetryNode: > >> > http://172.16.201.173:8983/solr/archive/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException > >> : > >>> Invalid > >>> UTF-8 character 0xfffe at char #416, byte #127) > >> > >> It sounds like your database is not using the UTF-8 character set, but > >> the JDBC driver (or the driver-server combination) is not aware that the > >> character set is different. Solr expects UTF-8. > >> > >> Generally what you want to do is tell the JDBC driver to use the UTF-8 > >> character set, which will hopefully cause either the driver or the DB > >> server to translate for you. > >> > >> There is a charSet parameter for the postgresql jdbc driver: > >> > >> http://jdbc.postgresql.org/documentation/80/connect.html > >> > >> These are added to the jdbc URL after a ? character, just like > >> parameters on an http URL. > >> > >> Thanks, > >> Shawn > >> > >> >