I have an simple servlet which gets the form fields and stores in a sql
server db. Now I am trying to store and retrive international characters
(charset EUC-JP).

The problem I am having here is:
For the first time when I send the characters, java gets it as ascii, It
returns back to the browser (IE 5.5) some junk, now here is the interesting
thing, I append the same characters to the junk and submit it. Now the later
text appears fine in the browser.

Question:
I am thinking that first time the browser encodes the text in ascii, then
later it encodes it properly. Is there anyway that I can solve this? Any
help is greatly appreciated.

Raghs

> -----Original Message-----
> From: Raghu Kolluru 
> Sent: Monday, October 02, 2000 10:52 AM
> To: '[EMAIL PROTECTED]'
> Subject: RE: Major site in unicode?
> 
> 
> Great! Thanks.
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, October 02, 2000 10:24 AM
> > To: Unicode List
> > Cc: Unicode List
> > Subject: RE: Major site in unicode?
> > 
> > 
> > It knows because:
> > 
> > 1. You sent the page in that character set, or;
> > 2. You embedded a token in the page to tell the CGI program what the
> > character set was, or;
> > 3. You used the (IE only) hack to get the browser to embed it 
> > in a hidden
> > field, or;
> > 4. You guessed it based on a heuristic (or from the user's session
> > information, maintained in the URL or in a cookie).
> > 
> > This sounds complex, but it isn't all that bad. Very few 
> users will be
> > foolish enough to change their display encoding to something 
> > that displays
> > the page incorrectly...
> > 
> > Actually, all this talk of "setting browser to Unicode" and 
> > "setting the
> > browser to code page" is based on a poor assumption or set of
> > assumptions. What's getting set is the character encoding of 
> > the HTML page
> > itself. If done correctly, the browser will read it from the 
> > HTTP header
> > and(or) the META tag.
> > 
> > The current best practice for creating multilingual capable 
> web sites
> > (even if they happen to be mono-lingual at any one URL) is to 
> > use Unicode
> > (either UTF-8 or UTF-16, depending on your operating
> > environment) internally at the server. A decision can be made 
> > to deliver
> > either UTF-8 or a non-Unicode legacy encoding at page 
> > delivery time. At
> > this point in time, most pages are NOT delivered as UTF-8, 
> > even though the
> > server-side systems are entirely Unicode, because of the 
> > problems cited
> > earlier with older Netscape and IE browsers and their still 
> relatively
> > large market share.
> > 
> > Choosing this architecture allows you to construct 
> single-source code
> > systems, access databases and data warehouses, and build 
> > applications in a
> > locale independent way. This vastly simplifies maintenance, 
> > testing, and
> > deployment compared to legacy charset systems.
> > 
> > ... many programmers, of course, would like to eliminate the 
> > complexity of
> > the charset conversion at delivery time, and this day is 
> > coming. I suggest
> > that you parse UserAgent strings at the start of a session 
> > with a user and
> > determine if UTF-8 can be sent to the browser (it can in the 
> > majority of
> > cases and the vast majority of Western and Eastern European 
> > cases: Asian
> > locales are the big hangup here), and set the result into 
> the session
> > (see #4 above).
> > 
> > Hope this helps.
> > 
> > Addison
> > 
> > ===========================================================
> > Addison P. Phillips                    Principal Consultant
> > Inter-Locale LLC                http://www.inter-locale.com
> > Los Gatos, CA, USA          mailto:[EMAIL PROTECTED]
> > 
> > +1 408.210.3569 (mobile)              +1 408.904.4762 (fax)
> > ===========================================================
> > Globalization Engineering & Consulting Services
> > 
> > On Mon, 2 Oct 2000, Raghu Kolluru wrote:
> > 
> > > > >> I assume that "the ISO standard" refers to ISO/IEC 8859-1 and
> > > > >> possibly 8859-2 as well.  Unicode is an ISO standard 
> > too (ISO/IEC
> > > > >> 10646-1).
> > > > >
> > > > >       So if my browser is set to ISO 8859-1 or ISO 
> 8859-2, but a
> > > > > Central Euopean or Western European site is only in 
> > > > Unicode, then all
> > > > > will show up correctly?
> > > > 
> > > > If your browser is old enough that it can only be "set 
> > to" a single
> > > > character set, and this setting cannot be overridden by a 
> > "charset=X"
> > > > tag in the HTML page, then no, it will not be displayed 
> > > > correctly.  But
> > > > this sort of rigidity is not present in modern browsers.
> > > 
> > > How does the CGI program know that the data submitted is of 
> > "charset=EUC-JP"
> > > ?
> > > 
> > > Raghu Kolluru, Software Engg.
> > > GO.com | Walt Disney Internet Group
> > > 206-664-4267 | [EMAIL PROTECTED]
> > > 
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Doug Ewell [mailto:[EMAIL PROTECTED]]
> > > > Sent: Sunday, October 01, 2000 11:48 PM
> > > > To: Unicode List
> > > > Subject: Re: Major site in unicode?
> > > > 
> > > > 
> > > > >> I assume that "the ISO standard" refers to ISO/IEC 8859-1 and
> > > > >> possibly 8859-2 as well.  Unicode is an ISO standard 
> > too (ISO/IEC
> > > > >> 10646-1).
> > > > >
> > > > >       So if my browser is set to ISO 8859-1 or ISO 
> 8859-2, but a
> > > > > Central Euopean or Western European site is only in 
> > > > Unicode, then all
> > > > > will show up correctly?
> > > > 
> > > > If your browser is old enough that it can only be "set 
> > to" a single
> > > > character set, and this setting cannot be overridden by a 
> > "charset=X"
> > > > tag in the HTML page, then no, it will not be displayed 
> > > > correctly.  But
> > > > this sort of rigidity is not present in modern browsers.
> > > > 
> > > > >> The browser you are thinking of is Netscape Navigator 
> > (pre-4.7).
> > > > >> Support for Unicode in all browsers is improving steadily, 
> > > > and as it
> > > > >> does, your 'adamant' programmers will end up using 
> > Unicode-encoded
> > > > >> sites without even realizing it.
> > > > >
> > > > >    When?  5 years from now?  As for using Unicode 
> > without realizing
> > > > > it, what do you mean?  If a Russian's browser is set to 
> > CP1251, what
> > > > > happens if the site is in Unicode?  At present he gets 
> > > > garbage.  I've
> > > > > tried the setting that automatically changes to the 
> > character set of
> > > > > the page.  Doesn't work very well.  I think the character set
> > > > > indication gets left out in many sites.
> > > > 
> > > > Browsers are supposed to be able to switch automatically to the
> > > > character set used by the target page, but they cannot 
> > necessarily do
> > > > this blindly by auto-detecting the character set.  It is 
> > > > supposed to be
> > > > indicated by the page using the "charset=X" tag.  Sites 
> > that do not do
> > > > this are not giving browsers a fair chance to display the page
> > > > properly.  This is not the fault of Unicode or the 
> > browser, but of the
> > > > HTML author.
> > > > 
> > > > >    I don't disagree with this.  It's just at present 
> > > > moment, Netscape
> > > > > and Explorer don't seem ready.   What would really be 
> > needed is the
> > > > > browser automatically detects the site as being in 
> Unicode, and
> > > > > switches to that character set.  Then sites could switch 
> > > > over without
> > > > > worry.  That is not the case at the moment.  So the 
> user has to
> > > > > change the character set himself.
> > > > 
> > > > Try using a recent version of your favorite browser (IE 
> > version 5.0 or
> > > > above, or NN version 4.7 or above).
> > > > 
> > > > I think the real problem here is that you, your team, and 
> > your users
> > > > in Russia are working with older versions of software 
> that did not
> > > > properly handle Unicode, and are assuming that newer 
> > versions will not
> > > > support Unicode either.  Thankfully, this is not the case.
> > > > 
> > > > -Doug Ewell
> > > >  Fullerton, California
> > > > 
> > > 
> > 
> 

Reply via email to