Vernon,

Your last question first. For mapping between Chinese chars and Unicode,
look at http://www.unicode.org/charts/index.html, click aon CJK Unified
Ideographs (5MB) link. You will also find a lot of useful information at
www.unicode.org.

We actually wrote two utility programs to simplify the conversion. Attached
is the source code. I assume you can input Chinese in an editor (like NJ
WordProcessor, www.njstar.com) and save it to your preferred encoding,
GB2312 or Big5. The UnicodeConverter program then take this file as the
input and output the unicode presentation to another file. For example,
    java UnicodeConverter gb2312 yourfile.GB.TXT yourfile.unicode
The other program, UnicodeReverseConverter does just the oppoiste. With help
of these tools, we can quickly *.properties in both directions. You may have
known this, just so to make things clear. We use \uxxxx (Chinese) in our
*_zh.properties files to serve Chinese locale. We have another jsp page
called unicodeGenerator.jsp to do Chinese (or other language) -> Unicode
conversion, attached at the bottom of the message.

Regarding your another question:
> Have you encountered any situation that messages come from the server? If
so, do you create
> another set of resource bundle?

You mean the messages generated from the Web server, like Tomcat? We don't
localizae that piece of information.

//----------------- UnicodeConverter.java -----------------------
import java.io.*;

/** Utility program to convert an encoded file to ISO-8859-1 encoding
(default encoding
 *  on Solaris and Windows english versions. For characters not in the
ISO-8859-1 charset,
 *  \\uxxxx will be output, where xxxx is the Unicode value of the
character.
 *
 * @author  Michael Zhou
 */
public class UnicodeConverter {

    /** Usage */
    public static void printUsage () {
        System.out.println ("\n\nUsage: java UnicodeConverter
<encoding_of_inputfile> <inputfile> <outputfile>\n\n");
    }

    /** Program entry
    * @param args the command line arguments
    */
    public static void main (String args[]) {
        if (args != null && args.length < 3) {
            printUsage ();
            System.exit (1);
        }

        String encoding = args[0];
        String inputfilename = args[1];
        String outputfilename = args[2];

        BufferedReader reader = null;
        BufferedWriter writer = null;

        try {
            reader = new BufferedReader (
                         new InputStreamReader (
                             new FileInputStream (inputfilename),
encoding));
            writer = new BufferedWriter (
                         new OutputStreamWriter (
                             new FileOutputStream (outputfilename),
"ISO-8859-1"));

            String line = null;
            while ((line = reader.readLine ()) != null) {
                for (int i=0;  i<line.length (); i++) {
                    if (line.charAt (i) < 128)
                        writer.write (line.charAt (i));
                    else {
                        writer.write ("\\u");
                        writer.write (Integer.toHexString (line.charAt
(i)));
                    }
                }
                writer.write ("\n");
            }
        }
        catch (FileNotFoundException ex) {
           System.out.println ("File Not Found Exception:");
           ex.printStackTrace ();
        }
        catch (UnsupportedEncodingException ex) {
           System.out.println ("Unsupported Encoding Exception:");
           ex.printStackTrace ();
        }
        catch (IOException ex) {
           System.out.println ("IO Exception:");
           ex.printStackTrace ();
        }
        finally {
           try {
              reader.close ();
              writer.close ();
           }
           catch (Exception ex) {
              System.out.println ("File close failed.");
              ex.printStackTrace ();
           }
        }
    }

}

//----------------- UnicodeReverseConverter.java -----------------------
import java.io.*;

/** Utility program to convert an ISO-8859-1 encoded file to another
encoding, ie, convert
 *  all \\uxxxx chars back to their Unicode chars.
 *
 * @author  Michael Zhou
 */
public class UnicodeReverseConverter {

    /** Usage */
    public static void printUsage () {
        System.out.println ("\n\nUsage: java UnicodeReverseConverter
<encoding_of_outputfile> <inputfile> <outputfile>\n\n");
    }

    /** Program entry
    * @param args the command line arguments
    */
    public static void main (String args[]) {
        if (args != null && args.length < 3) {
            printUsage ();
            System.exit (1);
        }

        String encoding = args[0];
        String inputfilename = args[1];
        String outputfilename = args[2];

        BufferedReader reader = null;
        BufferedWriter writer = null;

        try {
            reader = new BufferedReader (
                         new InputStreamReader (
                             new FileInputStream (inputfilename),
"ISO-8859-1"));
            writer = new BufferedWriter (
                         new OutputStreamWriter (
                             new FileOutputStream (outputfilename),
encoding));

            String line = null;
            outer: while ((line = reader.readLine ()) != null) {
                int index = line.indexOf ("\\u");
                while (index > -1) {
                    writer.write (line.substring (0, index));
                    // if there are not enough chars left, an exception will
be thrown
                    String temp = line.substring (index + 2, index + 6);
                    // exceptions could be thrown if convertion failed.
                    writer.write ((char) Integer.valueOf (temp, 16).intValue
());
                    if (index + 6 > line.length ()) {
                        // end of line
                        writer.write ("\n");
                        continue outer;
                    }
                    line = line.substring (index + 6);
                    index = line.indexOf ("\\u");
                }
                writer.write (line);
                writer.write ("\n");
            }
        }
        catch (FileNotFoundException ex) {
           System.out.println ("File Not Found Exception:");
           ex.printStackTrace ();
        }
        catch (UnsupportedEncodingException ex) {
           System.out.println ("Unsupported Encoding Exception:");
           ex.printStackTrace ();
        }
        catch (IOException ex) {
           System.out.println ("IO Exception:");
           ex.printStackTrace ();
        }
        catch (Exception ex) {
           System.out.println ("General exception: ");
           ex.printStackTrace ();
        }
        finally {
            try {
                reader.close ();
                writer.close ();
            }
            catch (Exception ex) {
            }
        }
    }
}

//----------------- unicodeGenerator.jsp -----------------------
<html><title>Convert to Unicode</title>
<head>

<%@ page contentType='text/html; charset=UTF-8' %>
<%@ page import='java.io.*' %>

</head>
<body>
Input any language below, press Enter to see Unicode<br>
<FORM ACTION=unicodeGenerator.jsp METHOD=GET>
<INPUT TYPE=TEXT NAME=text>
<INPUT TYPE=HIDDEN NAME=charset VALUE=UTF-8>
</form>
<hr>

<%
    out.println("Unicode:<br>");

    String charset = request.getParameter("charset");
    if (charset !=null) {
      // Get the text paramete
      String text = request.getParameter("text");

      // Now convert it from an array of bytes to an array of characters.
      // Do this using the charset that was sent as a hidden field.
      // Here we only bother to read the first line.
      BufferedReader reader = new BufferedReader(
            new InputStreamReader(new StringBufferInputStream(text),
charset));
      text = reader.readLine();
      out.println(toUnicodeEscapeString(text));
    }

%>

<%!
  private static char toHex(int nibble) {
    return hexDigit[(nibble & 0xF)];
  }
  private static char[] hexDigit = {
    '0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'
  };
  private static String toUnicodeEscapeString(String str) {
    // Modeled after the code in java.util.Properties.save()
    StringBuffer buf = new StringBuffer();
    int len = str.length();
    char ch;
    for (int i = 0; i < len; i++) {
      ch = str.charAt(i);
      switch (ch) {
        case '\\': buf.append("\\\\"); break;
        case '\t': buf.append("\\t"); break;
        case '\n': buf.append("\\n"); break;
        case '\r': buf.append("\\r"); break;

        default:
          //if (ch >= ' ' && ch <= 127) {
            //buf.append(ch);
          //}
          //else {
            buf.append('\\');
            buf.append('u');
            buf.append(toHex((ch >> 12) & 0xF));
            buf.append(toHex((ch >>  8) & 0xF));
            buf.append(toHex((ch >>  4) & 0xF));
            buf.append(toHex((ch >>  0) & 0xF));
          //}
      }
    }
    return buf.toString();
  }
%>

</body>
</html>

-----Original Message-----
From: Vernon Wu [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, May 22, 2002 9:06 PM
To: Tag Libraries Users List; Michael Zhou
Subject: Re: RE: Usage of i18n in JSTL


Hi, Michael,

Thanks for sharing information.

If my understanding is correct, the solution you mentioned is for changing
locale during a session. I have tried to use
your code, but can't test it out since one more class is needed to get it
running.

I use the lastest JSTL build as Jan suggests and the two problems are
resolved. (~/

Have you encountered any situation that messages come from the server? If
so, do you create another set of resource
bundle?

BTW, where is a good site to find out the mapping between Chinese character
and unicode?

Best regards,

Vernon


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to