Hi,

I'm not changing the text. I just read it. My problem occurs when there is any TextCharsAtom because the platform I am using doesn't support Unicode, just ISO-8859-1. So I had to change the code replacing UTF-16LE by ISO-8859-1.
   So I think I have no way out but show the text, without styles.

Thanks a lot,
--
Tales Paiva


Nick Burch wrote:
On Tue, 5 Dec 2006, Tales Paiva Nogueira wrote:
When PowerPoint stores text in Unicode a unknown char (byte value = 0) is placed between every "normal" char making the text 2 times longer than it really is.

TextCharsAtoms, and other unicode containing fields in powerpoint files, are stored as UTF-16. That means two bytes are used to store every character. US-ASCII will be stored with the second byte zero, but other characters will need to make some use of the second byte.

If you call getText() on a TextCharsAtom, it'll convert it to a string for you. You should really be using that, not getting the bytes directly.


Is there any way to keep the style information and get the text as a TextByteAtom, instead of TextCharsAtom?

Why? PowerPoint decided to make it a TextCharsAtom, rather than a TextByteAtom, since your string contained at least one character that couldn't be represented in a TextByteAtom.

HSLF supports upgrading a TextByteAtom to a TextCharsAtom if you try to set text that can't be held in a TextByteAtom. It doesn't do the other way around.


If you really want just the low order bytes, call getText() on the TextCharsAtom, and mangle the string yourself. Not sure why you'd want to though....

Nick



Yegor Kozlov wrote:
Hi,

Could you provide a test case?

As I understood you did something like this:

 - take a ppt file with a text.
 - programmatically change the text using HSLF API
 - save file
 - style information is wrong after save.

 Is it correct?
Yegor

TPN> Hi List,

TPN> When PowerPoint stores text in Unicode a unknown char (byte value = TPN> 0) is placed between every "normal" char making the text 2 times longer TPN> than it really is. I can ignore these garbage chars, but I lost the text TPN> style informations, as it's indexes are based in the original unicode TPN> text with all that unicode trash. :(

TPN> Is there any way to keep the style information and get the text as a TPN> TextByteAtom, instead of TextCharsAtom?

TPN> Thank you very much.
TPN> --
TPN> Tales Paiva

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to