RE: Handling the "s with caron" (u0160 / xA9)

Christian Gosch Wed, 22 Oct 2008 09:42:58 -0700

Addition:

I copied the u0161 / x9A from the browser into the sheet when open in 
Excel 2003 and saved a copy of the file and did the same with OOo 3.0. 
Then I reviewed the results in TextPad as binary files. Interesting was: 
MS Excel stores the value behind all other "normal" data in a separate 
block, like a special block for the needed encoding. OOo 3 stored the 
value "inline". But both converted the value to 16bit representation, as 
opposite to all other data which was stored in 8bit fashion.


Thus maybe the HSSFRichString(String) constructor may have a little flaw 
on deciding whether to handle a given String as UTF16 or not: It should 
decide not only on the binary numeric value ( > 255 or not) but rely on 
Java to check if it is part of a reasonable 8bit charset. This is 
because there are different 8bit charsets which have "holes" with a 
"meaning" in other widespread charsets and vice versa.

At the end, the user/client/developer eventually should get back a 
chance to decide which encoding to use, at least optionally.

Regards,
Christian Gosch
-- 
(footer as below)

> -----Original Message-----
> From: Christian Gosch
> Sent: Wednesday, October 22, 2008 6:23 PM
> To: user
> Subject: Handling the "s with caron" (u0160 / xA9)
> 
> Hi,
> 
> we have an application which is not really "international" but 
contains
> data with personal names from different countries, also from Czech
> Republic. They have a character not present in ISO-8859-1, but in
> Cp1252, which is the encoding used on our clients' machines. The
> character in question is u0161 / x9A "" and its counterpart, the u0160
> / x9B.
> 
> We have some names with that character in the data, and in the browser
> it shows up OK on German Windows platforms, since they use Cp1252, and
> in the Oracle WEISO88591 char set of the DB server it is just located 
on
> a "hole" of ISO-8859-1 -- simply a meaningless binary code.
> 
> BUT: When building an XLS file with POI (3.1-final and 3.2-final 
tested,
> 3.1-final with setEncoding(UTF8) -> setCellValue(String), 3.2-final 
with
> setCellValue(HSSFRichString(String)) ), this character shows up as a
> narrow caret.
> 
> What happens to this character when putting it in a cell using
> HSSFCell.setCellVaue(new HSSFRichString((String)theValue)) ?
> 
> When I look in the binary file using TextPad 4.7.3, it is displayed as
> x9A, also shown as black narrow caret in the "visible characters 
column"
> of the hex view.
> 
> Are there any other characters or binary codes which are handled the
> same way by POI and thus are not really readable in a (POI-)generated
> XLS file?
> 
> Thanks for answers,
> --
> Dipl.-Inform. Christian Gosch, PMI PMP
> Systems Architecture, Project Management
> 
> inovex GmbH
> Bro Pforzheim
> Karlsruher Strasse 71
> D-75179 Pforzheim
> Tel: +49 (0)7231 3191-85
> Fax: +49 (0)7231 3191-91
> [EMAIL PROTECTED]
> www.inovex.de
> 
> Sitz der Gesellschaft: Pforzheim
> AG Mannheim, HRB 502126
> Geschftsfhrer: Stephan Mller
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> !DSPAM:48ff540222933942915773!
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Handling the "s with caron" (u0160 / xA9)

Reply via email to