Hi Massimo,

What's the current state of this particular problem? I'm having similar 
issues at the moment.

For me it's occurring when windows users are cutting and pasting text into 
web2py forms which contain cp1252 encoded characters.

I've mentioned this in a very recent posting.

https://groups.google.com/d/topic/web2py/Ozti4iBiq1w/discussion

At the moment it's causing my app to crash with low level server errors.

Your existing feedburner contrib class has code to strip out the offending 
characters and replace them.

If there is any way that a GAE specific fix could be rolled into the FORM 
class somehow then that would be a fantastic solution for me and any other 
users who'll eventually encounter this.

Kind regards,
Matt

On Tuesday, July 1, 2008 4:21:58 PM UTC+12, Massimo Di Pierro wrote:
>
> I do not like this solution. I would like something that that goes  
> into gluon/main.py so it is transparent.
> I also do not want to use chardet since web2py only relies on basic  
> modules.
> Could you help me understand the problem?
> When you have foreign characters in the input, which lines in the  
> function f get executed? which encoding is detected on GAE? What  
> would happen if the funciton f were to re-encode in UTF8 and return  
> UTF8-encoded string?
>
> Massimo
>
>
> On Jun 30, 2008, at 6:10 PM, Abner wrote:
>
> >
> > Massimo,
> >
> > I tested here an hack to solve, in an more generic form, the problem:
> >
> > in model:
> >
> > # as default we are not runing on GAE
> > RUNNINGINGAE = False
> >
> > try:
> >     from gluon.contrib.gql import *
> >     db=GQLDB()
> >     # Ok, we are on GAE
> >     RUNNINGINGAE = True
> > except:
> >     db=SQLDB("sqlite://jomeme1.db")
> > session.connect(request,response,db=db)
> >
> > # Thanks to Ga for this class
> > class TO_UTF:
> >      def __init__(self,f): self.f=f
> >      def __call__(self,value): return (self.f(value),None)
> >
> > def f(v):
> >     if isinstance(v, str):
> >         try:
> >             v = v.decode('utf-8')
> >             return v
> >         except UnicodeDecodeError:
> >             import chardet
> >             info = chardet.detect(v)
> >             try:
> >                 v = v.decode(info['encoding'])
> >                 return v
> >             except UnicodeDecodeError, e:
> >                 raise UnicodeDecodeError("%s (tried UTF-8, %s)" %
> > (e,info['encoding']))
> >
> > db.define_table('sites',
> >
> > SQLField 
> > ('domain',length=256,required=True,default='localhost',unique=True),
> >     SQLField('logo',length=256,required=True,default='none.gif'),
> >     SQLField('telefone',length=36,required=True,default='0xx 99
> > 9999-9999'),
> >     SQLField('endereco',length=64,required=True,default='Rua
> > XXXXXXXXXXXXXXXXXX'),
> >     SQLField('end_num','integer',required=True,default=9999),
> >     SQLField('end_compl',length=16,default=''),
> >     SQLField('bairro',length=32,required=True,default='Centro'),
> >     SQLField('cidade',length=32,required=True,default='Vitória'),
> >     SQLField('estado',length=2,required=True,default='ES'),
> >     SQLField('cep',length=9,required=True,default='29000-000'),
> >
> > SQLField 
> > ('email_contato',length=128,required=True,default='x...@xxx.com.br')
> >     )
> >
> > # Some generic requires used in GAE or not, note the second, it don't
> > use []
> > db.sites.domain.requires=[IS_NOT_EMPTY(), IS_NOT_IN_DB(db,
> > 'sites.domain')]
> > db.sites.end_compl.requires=IS_NOT_EMPTY()
> >
> > # Now, if in GAE, we add TO_UTF in every text or string field
> > if RUNNINGINGAE:
> >         from types import ListType
> >         for fieldname in db.sites.fields:
> >                 if fieldname!='id' and (db.sites[fieldname].type ==  
> > 'string' or
> > db.sites[fieldname].type == 'text'):
> >                         # For requires defined above without [] or  
> > empties requires
> >                         if not isinstance(db.sites 
> > [fieldname].requires, ListType):
> >                                 # Empty require - Is this the best  
> > method to do it ?
> >                                 if not db.sites[fieldname].requires:
> >                                         db.sites 
> > [fieldname].requires = TO_UTF(f)
> >                                 # Required defined without []
> >                                 else:
> >                                         tmp = db.sites 
> > [fieldname].requires
> >                                         db.sites 
> > [fieldname].requires = []
> >                                         db.sites 
> > [fieldname].requires.append(tmp)
> >                                         db.sites 
> > [fieldname].requires.append(TO_UTF(f))
> >                         # For requires defined above using []
> >                         else:
> >                                 db.sites[fieldname].requires.append 
> > (TO_UTF(f))
> >
> >
> >
> > What you think about this solution ?
> >
> > regards,
> >
> > abner
> >
> > On 30 jun, 18:05, Abner <abner.jacob...@gmail.com> wrote:
> >> Massimo,
> >>
> >> Your code don't work, but, I find in the group posts this code from
> >> GA, and it worked fine:
> >>
> >> class TO_UTF:
> >>      def __init__(self,f): self.f=f
> >>      def __call__(self,value): return (self.f(value),None)
> >>
> >> def f(v):
> >>     if isinstance(v, str):
> >>         try:
> >>             v = v.decode('utf-8')
> >>             return v
> >>         except UnicodeDecodeError:
> >>             import chardet
> >>             info = chardet.detect(v)
> >>             try:
> >>                 v = v.decode(info['encoding'])
> >>                 return v
> >>             except UnicodeDecodeError, e:
> >>                 raise UnicodeDecodeError("%s (tried UTF-8, %s)" %
> >> (e,info['encoding']))
> >>
> >> .....
> >>
> >> db.sites.cidade.requires=TO_UTF(f)
> >>
> >> The problem is that I need to add an require to every text or string
> >> field in my model. This UnicodeDecodeError can be very annoying for
> >> every user using languages other than English, the ideal solution, I
> >> think, is to add in gql.py some code to process every string or text
> >> field going to be stored in the datastore. How I can do this ?
> >>
> >> Thanks
> >>
> >> Abner
> >>
> >> On 30 jun, 17:22, Massimo Di Pierro <mdipie...@cs.depaul.edu> wrote:
> >>
> >>> I do not know why the input data is not unicode. It is supposed  
> >>> to be
> >>> UTF-8.
> >>
> >>> Try this validator for your text fields
> >>
> >>> rus_unicode = [ u'\u0410', u'\u0411', u'\u0412', u'\u0413',  
> >>> u'\u0414',
> >>> u'\u0415', u'\u0416', u'\u0417', u'\u0418', u'\u0419', u'\u041a',
> >>> u'\u041b', u'\u041c', u'\u041d', u'\u041e', u'\u041f', u'\u0420',
> >>> u'\u0421', u'\u0422', u'\u0423', u'\u0424', u'\u0425', u'\u0426',
> >>> u'\u0427', u'\u0428', u'\u0429', u'\u042a', u'\u042b', u'\u042c',
> >>> u'\u042d', u'\u042e', u'\u042f', u'\u0430', u'\u0431', u'\u0432',
> >>> u'\u0433', u'\u0434', u'\u0435', u'\u0436', u'\u0437', u'\u0438',
> >>> u'\u0439', u'\u043a', u'\u043b', u'\u043c', u'\u043d', u'\u043e',
> >>> u'\u043f', u'\u0440', u'\u0441', u'\u0442', u'\u0443', u'\u0444',
> >>> u'\u0445', u'\u0446', u'\u0447', u'\u0448', u'\u0449', u'\u044a',
> >>> u'\u044b', u'\u044c', u'\u044d', u'\u044e', u'\u044f']
> >>
> >>> class GAE_FIX:
> >>>      def __call__(self,value):
> >>>          result = ""
> >>>          for i in range(0, len(s)):
> >>>              if ord(s[i])<128:
> >>>                  result = result + unicode(s[i])
> >>>              elif ord(s[i])==184:
> >>>                  result = result + unichr(0x0451)
> >>>              elif ord(s[i])==168:
> >>>                  result = result + unichr(0x0401)
> >>>              elif ord(s[i])>=192:
> >>>                  result = result + rus_unicode[ord(s[i])-192]
> >>>              else:
> >>>                  result = result + unicode(" ")
> >>>          return (result.encode('utf8'),None)
> >>
> >>> Use it with requires=[GAE_FIX(), other validators, ....]
> >>> Perhaps other users have better suggestions.
> >>
> >>> On Jun 30, 2008, at 2:48 PM, Abner wrote:
> >>
> >>>> Hi,
> >>
> >>>> I'm from Brasil developing ans small application with WEB2Py to  
> >>>> run in
> >>>> GAE.
> >>
> >>>> When I try to add new data to the GAE datastore, using the local
> >>>> version or the hosted on google, I get these erros when some data
> >>>> field have characters like é, á, ç, õ:
> >>
> >>>> ERROR    2008-06-30 19:40:32,922 __init__.py] Traceback (most  
> >>>> recent
> >>>> call last):
> >>>>  File "/home/abner/devel/python/gae/google_appengine/web2py/gluon/
> >>>> restricted.py", line 62, in restricted
> >>>>    exec ccode in environment
> >>>>  File "/home/abner/devel/python/gae/google_appengine/web2py/
> >>>> applications/jomeme1/controllers/panel.py", line 49, in <module>
> >>>>  File "/home/abner/devel/python/gae/google_appengine/web2py/
> >>>> applications/jomeme1/controllers/panel.py", line 41, in blocks
> >>>>    if form.accepts(request.vars,session):
> >>>>  File "/home/abner/devel/python/gae/google_appengine/web2py/gluon/
> >>>> sqlhtml.py", line 223, in accepts
> >>>>    self.vars.id=self.table.insert(**fields)
> >>>>  File "/home/abner/devel/python/gae/google_appengine/web2py/gluon/
> >>>> contrib/gql.py", line 169, in insert
> >>>>    tmp=self._tableobj(**fields)
> >>>>  File "google/appengine/ext/db/__init__.py", line 555, in __init__
> >>>>  File "google/appengine/ext/db/__init__.py", line 372, in __set__
> >>>>  File "google/appengine/ext/db/__init__.py", line 1583, in validate
> >>>>  File "/home/abner/devel/python/gae/google_appengine/google/ 
> >>>> appengine/
> >>>> api/datastore_types.py", line 816, in __new__
> >>>>    return super(Text, cls).__new__(cls, arg, encoding)
> >>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in  
> >>>> position
> >>>> 70: ordinal not in range(128)
> >>
> >>>> My controller is this:
> >>
> >>>> def sites():
> >>>>    message = None
> >>>>    response.view = 'panel/sites.html'
> >>>>    form=SQLFORM(db.sites)
> >>>>    if form.accepts(request.vars,session):
> >>>>        response.flash="form accepted"
> >>>>    elif form.errors:
> >>>>        response.flash="form is invalid"
> >>>>    else:
> >>>>        message = "Por favor, preencha o formulário"
> >>>>    return dict(message=message, form=form,vars=form.vars)
> >>
> >>>> My model;
> >>
> >>>> db.define_table('sites',
> >>
> >>>> SQLField
> >>>> ('domain',length=256,required=True,default='localhost',unique=True) 
> >>>> ,
> >>>>    SQLField('logo',length=256,required=True,default='none.gif'),
> >>>>    SQLField('telefone',length=36,required=True,default='0xx 99
> >>>> 9999-9999'),
> >>>>    SQLField('endereco',length=64,required=True,default='Rua
> >>>> XXXXXXXXXXXXXXXXXX'),
> >>>>    SQLField('end_num','integer',required=True,default=9999),
> >>>>    SQLField('end_compl',length=16,default=''),
> >>>>    SQLField('bairro',length=32,required=True,default='Centro'),
> >>>>    SQLField('cidade',length=32,required=True,default='Vitória'),
> >>>>    SQLField('estado',length=2,required=True,default='ES'),
> >>>>    SQLField('cep',length=9,required=True,default='29000-000'),
> >>
> >>>> SQLField 
> >>>> ('email_contato',length=128,required=True,default='...@xxx.com.br
> >>>> '),
> >>
> >>>> The problem also occurs in another table using field of type  
> >>>> 'text'.
> >>
> >>>> How I can solve this problem ?
> >>
> >>>> Thanks.
> > >
>
>

Reply via email to