On Jan 28, 2:31 pm, John Machin <[EMAIL PROTECTED]> wrote:
> On Jan 28, 2:53 pm, glacier <[EMAIL PROTECTED]> wrote:
>
>
>
> > Thanks,John.
> > It's no doubt that you proved SAX didn't support GBK encoding.
> > But can you give some suggestion on how to make SAX parse some GBK
> > string?
>
> Yes, t
On Jan 28, 2:53 pm, glacier <[EMAIL PROTECTED]> wrote:
>
> Thanks,John.
> It's no doubt that you proved SAX didn't support GBK encoding.
> But can you give some suggestion on how to make SAX parse some GBK
> string?
Yes, the same suggestion as was given to you by others very early in
this thread,
On 1月28日, 上午5时50分, John Machin <[EMAIL PROTECTED]> wrote:
> On Jan 28, 7:47 am, "Mark Tolonen" <[EMAIL PROTECTED]>
> wrote:
>
>
>
>
>
> > >"John Machin" <[EMAIL PROTECTED]> wrote in message
> > >news:[EMAIL PROTECTED]
> > >On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote:
> > >> On 1月24日, 下午3时
On Jan 28, 7:47 am, "Mark Tolonen" <[EMAIL PROTECTED]>
wrote:
> >"John Machin" <[EMAIL PROTECTED]> wrote in message
> >news:[EMAIL PROTECTED]
> >On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote:
> >> On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]>
> >> wrote:
>
> >*IF* the file is w
>"John Machin" <[EMAIL PROTECTED]> wrote in message
>news:[EMAIL PROTECTED]
>On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote:
>> On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]>
>> wrote:
>
>*IF* the file is well-formed GBK, then the codec will not mess up when
>decoding it to Un
>> Is there any way to solve this better?
>> I mean if I shouldn't convert the GBK string to unicode string, what
>> should I do to make SAX work?
>
> Decode it and then encode it to utf-8 before feeding it to the parser.
The tricky part is that you also need to change the encoding declaration
in
On 1月27日, 下午7时20分, John Machin <[EMAIL PROTECTED]> wrote:
> On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote:
>
>
>
>
>
> > On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote:
>
> > > En Thu, 24 Jan 2008 04:52:22 -0200, glacier <[EMAIL PROTECTED]> escribió:
>
> > > > According to
On 1月27日, 下午7时04分, John Machin <[EMAIL PROTECTED]> wrote:
> On Jan 27, 9:18 pm, glacier <[EMAIL PROTECTED]> wrote:
>
>
>
>
>
> > On 1月24日, 下午4时44分, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
>
> > > On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote:
> > > > My second question is: is there
On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote:
> On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote:
>
>
>
> > En Thu, 24 Jan 2008 04:52:22 -0200, glacier <[EMAIL PROTECTED]> escribió:
>
> > > According to your reply, what will happen if I try to decode a long
> > > string sep
On Jan 27, 9:18 pm, glacier <[EMAIL PROTECTED]> wrote:
> On 1月24日, 下午4时44分, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
>
> > On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote:
> > > My second question is: is there any one who has tested very long mbcs
> > > decode? I tried to decode a long
On Sun, 27 Jan 2008 02:18:48 -0800, glacier wrote:
> Yepp. I feed SAX with the unicode string since SAX didn't support my
> encoding system(GBK).
If the `decode()` method supports it, IMHO SAX should too.
> Is there any way to solve this better?
> I mean if I shouldn't convert the GBK string to
On 1月24日, 下午5时51分, John Machin <[EMAIL PROTECTED]> wrote:
> On Jan 24, 2:49 pm, glacier <[EMAIL PROTECTED]> wrote:
>
> > I use chinese charactors as an example here.
>
> > >>>s1='你好吗'
> > >>>repr(s1)
>
> > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
>
> > >>>b1=s1.decode('GBK')
>
> > My first question is :
On 1月24日, 下午4时44分, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote:
> > My second question is: is there any one who has tested very long mbcs
> > decode? I tried to decode a long(20+MB) xml yesterday, which turns out
> > to be very strange and
On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote:
> En Thu, 24 Jan 2008 04:52:22 -0200, glacier <[EMAIL PROTECTED]> escribió:
>
> > According to your reply, what will happen if I try to decode a long
> > string seperately.
> > I mean:
> > ##
> > a
On Jan 24, 1:44 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote:
> > My second question is: is there any one who has tested very long mbcs
> > decode? I tried to decode a long(20+MB) xml yesterday, which turns out
> > to be very strange an
On Jan 24, 2:49 pm, glacier <[EMAIL PROTECTED]> wrote:
> I use chinese charactors as an example here.
>
> >>>s1='你好吗'
> >>>repr(s1)
>
> "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
>
> >>>b1=s1.decode('GBK')
>
> My first question is : what strategy does 'decode' use to tell the way
> to seperate the words. I
On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote:
> My second question is: is there any one who has tested very long mbcs
> decode? I tried to decode a long(20+MB) xml yesterday, which turns out
> to be very strange and cause SAX fail to parse the decoded string.
That's because SAX wants bytes,
On 1月24日, 下午1时49分, [EMAIL PROTECTED] wrote:
> On Jan 23, 8:49 pm, glacier <[EMAIL PROTECTED]> wrote:
>
> > I use chinese charactors as an example here.
>
> > >>>s1='你好吗'
> > >>>repr(s1)
>
> > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
>
> > >>>b1=s1.decode('GBK')
>
> > My first question is : what strategy
En Thu, 24 Jan 2008 04:52:22 -0200, glacier <[EMAIL PROTECTED]> escribió:
> According to your reply, what will happen if I try to decode a long
> string seperately.
> I mean:
> ##
> a='你好吗'*10
> s1 = u''
> cur = 0
> while cur < len(a):
> d = min(len(a)-i
On 1月24日, 下午1时41分, Ben Finney <[EMAIL PROTECTED]>
wrote:
> Ben Finney <[EMAIL PROTECTED]> writes:
> > glacier <[EMAIL PROTECTED]> writes:
>
> > > I use chinese charactors as an example here.
>
> > > >>>s1='你好吗'
> > > >>>repr(s1)
> > > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
> > > >>>b1=s1.decode('GBK')
On Jan 23, 8:49 pm, glacier <[EMAIL PROTECTED]> wrote:
> I use chinese charactors as an example here.
>
> >>>s1='你好吗'
> >>>repr(s1)
>
> "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
>
> >>>b1=s1.decode('GBK')
>
> My first question is : what strategy does 'decode' use to tell the way
> to seperate the words.
Ben Finney <[EMAIL PROTECTED]> writes:
> glacier <[EMAIL PROTECTED]> writes:
>
> > I use chinese charactors as an example here.
> >
> > >>>s1='你好吗'
> > >>>repr(s1)
> > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
> > >>>b1=s1.decode('GBK')
> >
> > My first question is : what strategy does 'decode' use to
glacier <[EMAIL PROTECTED]> writes:
> I use chinese charactors as an example here.
>
> >>>s1='你好吗'
> >>>repr(s1)
> "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
> >>>b1=s1.decode('GBK')
>
> My first question is : what strategy does 'decode' use to tell the way
> to seperate the words. I mean since s1 is an
I use chinese charactors as an example here.
>>>s1='你好吗'
>>>repr(s1)
"'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
>>>b1=s1.decode('GBK')
My first question is : what strategy does 'decode' use to tell the way
to seperate the words. I mean since s1 is an multi-bytes-char string,
how did it determine to seper
24 matches
Mail list logo