Re: Validate string as UTF-8?

2005-11-06 Thread Tony Nelson
In article <[EMAIL PROTECTED]>, "Fredrik Lundh" <[EMAIL PROTECTED]> wrote: > Tony Nelson wrote: > > > I'd like to have a fast way to validate large amounts of string data as > > being UTF-8. > > define "validate". All data conforms to the UTF-8 encoding format. I can stand if someone has mad

Re: Validate string as UTF-8?

2005-11-06 Thread Waitman Gobble
I have done this using a sytem call to the program "recode". Recode a file UTF-8 and do a diff on the original and recoded files. Not an elegant solution but did seem to function properly. Take care, Waitman Gobble -- http://mail.python.org/mailman/listinfo/python-list

Re: Validate string as UTF-8?

2005-11-06 Thread Tony Nelson
In article <[EMAIL PROTECTED]>, david mugnai <[EMAIL PROTECTED]> wrote: > On Sun, 06 Nov 2005 18:58:50 +, Tony Nelson wrote: > > [snip] > > > Is there a general way to call GLib functions? > > ctypes? > http://starship.python.net/crew/theller/ctypes/ Umm. Might be easier to write an exte

Re: Validate string as UTF-8?

2005-11-06 Thread Diez B. Roggisch
Tony Nelson wrote: > I'd like to have a fast way to validate large amounts of string data as > being UTF-8. > > I don't see a fast way to do it in Python, though: > > unicode(s,'utf-8').encode('utf-8) > > seems to notice at least some of the time (the unicode() part works but > the encode(

Re: Validate string as UTF-8?

2005-11-06 Thread Fredrik Lundh
Tony Nelson wrote: > I'd like to have a fast way to validate large amounts of string data as > being UTF-8. define "validate". > I don't see a fast way to do it in Python, though: > > unicode(s,'utf-8').encode('utf-8) if "validate" means "make sure the byte stream doesn't use invalid sequen

Re: Validate string as UTF-8?

2005-11-06 Thread david mugnai
On Sun, 06 Nov 2005 18:58:50 +, Tony Nelson wrote: [snip] > Is there a general way to call GLib functions? ctypes? http://starship.python.net/crew/theller/ctypes/ -- http://mail.python.org/mailman/listinfo/python-list

Validate string as UTF-8?

2005-11-06 Thread Tony Nelson
I'd like to have a fast way to validate large amounts of string data as being UTF-8. I don't see a fast way to do it in Python, though: unicode(s,'utf-8').encode('utf-8) seems to notice at least some of the time (the unicode() part works but the encode() part bombs). I don't consider a RE