I read the specification .

http://msdn.microsoft.com/en-us/library/gg615596.aspx
I guess it should set *Pcd.Fc.fCompressed*
"

   1.

   For each *Pcd* structure in *PlcPcd.aPcd*:
   1.

      Read the value of the *Pcd.Fc.fCompressed* field at bit 46 of the
      current Pcd structure. If 0, the *Pcd* structure refers to a 16-bit
      Unicode character. If 1, it refers to an 8-bit ANSI character.
      2.

      Read the value of *Pcd.Fc*, which is bytes 2-5 of the current Pcd, and
      the corresponding CP value.
      - If Unicode, the text at the character position specified by the
         current CP value starts at on offset equal to the value of
Pcd.Fc in the
         Word Document stream, and occupies two bytes per character.

         - If ANSI, The text at the current CP starts at an offset of half
         the value of *Pcd.Fc*, and occupies one byte per character.

         In either case, the number of characters specified by the current
      CP is equal to the value of the next CP in the array minus that of the
      current CP



2011/8/9 Scott Zhang <[email protected]>

> Hi. Sergey and all.
>
>   I have checked out the code from svn and build it myself. The insert
> function is working. But Chinese word is not working either.
>   So I did following checking into the document data POI generated.
>
> Here is what I found. I see we are not far, just need few more effort.
> 1. where I input "hello, world 你好" in doc, then edit doc using a hex
> editor.
> I found the text is saved as
> 68 00 65 00 6c 00 6c 00 6f 00 2c 00 77 00
> h       e       l          l        o      ,        w
> 60  4f  7d 59
> 你       好
>
> So the truth is simple, the doc internally is using UTF-16LE to save
> content. I tried to manually input 60 4f 7d 59 following the text I input in
> doc. Then save and open in office again. The 60 4f 7d 59 is correctly
> displayed as "你好“.
>
> 2. When I use POI to insert text into word
> range.insertAfter("hello,world");
>
> The binary code POI generated is
> 68 65 6c 6c 6f
> h   e   l   l    o
> And if I use range.insertAfter("hello,world你好").  The "你好" was translate to
> a code I can't figure out.
> So I am using
> range.insertAfter(new String("hello,world你好").getBytes("UTF-16LE"));
> Good news is it is generate correctly in doc as
> 68 00 65 00 6c 00 6c 00
> The binary is same as expected. But office word display 'h' as a wide 'h'
> and display "你好" as mess code.
>
> So what I am thinking is, as we have generated the correct binary
> representation of character. There should be somewhere setting the default
> encoding of characters in word.
>
> Can anyone point it out?
> I know we are nearly solve this now.
>
>
> Regards.
> Scott
>
>
>
>
>
>
> On Tue, Aug 9, 2011 at 2:13 PM, Scott Zhang <[email protected]>wrote:
>
>> hi. Sergey.
>>
>> Checking out svn code now.
>>
>> Thanks.
>> Regards.
>> Scott
>>
>>
>> On Tue, Aug 9, 2011 at 1:23 PM, Sergey Vladimirov <[email protected]>wrote:
>>
>>> Hi, Scott.
>>>
>>> I've just fixed text editing issue in trunk. Please check using latest
>>> code from SVN trunk or wait until tomorrow to test with beta4-20110810
>>> :)
>>>
>>> Best regards,
>>> Sergey
>>>
>>> On Tue, Aug 9, 2011 at 9:08 AM, Scott Zhang <[email protected]>
>>> wrote:
>>> > Hi. Sergey.
>>> >
>>> >  I download the latest jar file.
>>> >
>>> > poi-scratchpad-3.8-beta4-20110808.jar
>>> > poi-3.8-beta4-20110808.jar
>>> > poi-excelant-3.8-beta4-20110808.jar
>>> >
>>> > and replace with my existing jars.
>>> >
>>> > Now the read is still the correct. But the write,
>>> > range.insertAfter("helloworld");
>>> > even the English word can't be insert into doc file either. nothing was
>>> > inserted into the document.
>>> >
>>> > Can you check this too?
>>> >
>>> >
>>> > I don't link with
>>> >
>>> > poi-dependencies-<version>-<date>.jar
>>> > because I see without it my compilation works fine too. Will it be the
>>> issue?
>>> >
>>> >
>>> >
>>> >
>>> > Regards.
>>> > Scott
>>> >
>>> > On Tue, Aug 9, 2011 at 12:32 PM, Scott Zhang <[email protected]>
>>> wrote:
>>> >
>>> >> Sure.
>>> >>
>>> >> Doing
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Aug 9, 2011 at 12:29 PM, Sergey Vladimirov <
>>> [email protected]>wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> Could you try to do it using latest version of POI, available at
>>> >>> http://encore.torchbox.com/poi-cvs-build/ ?
>>> >>>
>>> >>> Best regards,
>>> >>> Sergey
>>> >>>
>>> >>> On Tue, Aug 9, 2011 at 7:41 AM, Scott Zhang <[email protected]>
>>> >>> wrote:
>>> >>> > Hello.
>>> >>> >    I am using POI library to read/write text from word2003 files.
>>> >>> >    I use
>>> >>> >    range = document.getRange();
>>> >>> >    system.out.println(range.text());
>>> >>> >
>>> >>> > the Chinese character is output correctly.
>>> >>> >   But when I try to insert the same text back.
>>> >>> >    range.insertAfter(range.text());
>>> >>> >
>>> >>> > I can only see mess code in output doc. How can I solve this?
>>> >>> >
>>> >>> >
>>> >>> > Thanks.
>>> >>> > Regards.
>>> >>> > Scott
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Sergey Vladimirov
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: [email protected]
>>> >>> For additional commands, e-mail: [email protected]
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Sergey Vladimirov
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>

Reply via email to