Re: [protobuf] Re: protobuf not handling special characters between Java server and C++ client

2011-01-26 Thread Evan Jones
On Jan 26, 2011, at 3:43 , Hitesh Jethwani wrote: Can we encode the protobuf data in ISO-8859-1 from the server end itself? Yes. In this case, you need to use the protocol buffer "bytes" type instead of the protocol buffer "string" type, since you want to exchange ISO-8859-1 bytes from pro

[protobuf] Re: protobuf not handling special characters between Java server and C++ client

2011-01-26 Thread Hitesh Jethwani
> The reason this appears to work is because String.getBytes() encodes in > ISO-8859-1 encoding by default. Thanks a lot for the above. Just want to summarize my understanding. C++ needs to explicitly decode the UTF8 encoded string, which is when it will interpret the characters properly. I can us

Re: [protobuf] Re: protobuf not handling special characters between Java server and C++ client

2011-01-25 Thread Kenton Varda
On Tue, Jan 25, 2011 at 8:57 PM, Hitesh Jethwani wrote: > > if on the stream writer, I add something like: > > writer.write(new String(msg.getBytes(), "UTF8").getBytes()) instead of > > simply writer.write(msg.getBytes()), I see the characters as expected > > on the C++ client. However this I beli

[protobuf] Re: protobuf not handling special characters between Java server and C++ client

2011-01-25 Thread Hitesh Jethwani
> I was of the opinion that UTF8 encoding encodes each character using 8 > bits or a byte. My understanding of UTF8 was clearly wrong. Just did some reading again, it encodes characters in bytes, and can use upto 4 bytes to represent a character. > if on the stream writer, I add something like: >

[protobuf] Re: protobuf not handling special characters between Java server and C++ client

2011-01-25 Thread Hitesh Jethwani
Thanks for pointing that out Evans. > The Java protocol buffer API encodes strings as UTF-8. Since C++ has > no unicode support, what you get on the other end is the raw UTF-8 > encoded data. I was of the opinion that UTF8 encoding encodes each character using 8 bits or a byte. So not sure as to wh