Addition: I copied the u0161 / x9A from the browser into the sheet when open in Excel 2003 and saved a copy of the file and did the same with OOo 3.0. Then I reviewed the results in TextPad as binary files. Interesting was: MS Excel stores the value behind all other "normal" data in a separate block, like a special block for the needed encoding. OOo 3 stored the value "inline". But both converted the value to 16bit representation, as opposite to all other data which was stored in 8bit fashion.
Thus maybe the HSSFRichString(String) constructor may have a little flaw on deciding whether to handle a given String as UTF16 or not: It should decide not only on the binary numeric value ( > 255 or not) but rely on Java to check if it is part of a reasonable 8bit charset. This is because there are different 8bit charsets which have "holes" with a "meaning" in other widespread charsets and vice versa. At the end, the user/client/developer eventually should get back a chance to decide which encoding to use, at least optionally. Regards, Christian Gosch -- (footer as below) > -----Original Message----- > From: Christian Gosch > Sent: Wednesday, October 22, 2008 6:23 PM > To: user > Subject: Handling the "s with caron" (u0160 / xA9) > > Hi, > > we have an application which is not really "international" but contains > data with personal names from different countries, also from Czech > Republic. They have a character not present in ISO-8859-1, but in > Cp1252, which is the encoding used on our clients' machines. The > character in question is u0161 / x9A "" and its counterpart, the u0160 > / x9B. > > We have some names with that character in the data, and in the browser > it shows up OK on German Windows platforms, since they use Cp1252, and > in the Oracle WEISO88591 char set of the DB server it is just located on > a "hole" of ISO-8859-1 -- simply a meaningless binary code. > > BUT: When building an XLS file with POI (3.1-final and 3.2-final tested, > 3.1-final with setEncoding(UTF8) -> setCellValue(String), 3.2-final with > setCellValue(HSSFRichString(String)) ), this character shows up as a > narrow caret. > > What happens to this character when putting it in a cell using > HSSFCell.setCellVaue(new HSSFRichString((String)theValue)) ? > > When I look in the binary file using TextPad 4.7.3, it is displayed as > x9A, also shown as black narrow caret in the "visible characters column" > of the hex view. > > Are there any other characters or binary codes which are handled the > same way by POI and thus are not really readable in a (POI-)generated > XLS file? > > Thanks for answers, > -- > Dipl.-Inform. Christian Gosch, PMI PMP > Systems Architecture, Project Management > > inovex GmbH > Bro Pforzheim > Karlsruher Strasse 71 > D-75179 Pforzheim > Tel: +49 (0)7231 3191-85 > Fax: +49 (0)7231 3191-91 > [EMAIL PROTECTED] > www.inovex.de > > Sitz der Gesellschaft: Pforzheim > AG Mannheim, HRB 502126 > Geschftsfhrer: Stephan Mller > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > !DSPAM:48ff540222933942915773! > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
