On Fri, Aug 7, 2009 at 2:14 PM, Jonathan Lundell <jlund...@pobox.com> wrote:
> On Aug 7, 2009, at 10:04 AM, Yarko Tymciurak wrote: > > Whoever makes up this patch, since this is complicated enough, > can I ask you follow the commented regex style (re.X) > which is now used to validate paths; > > see example starting on line 74 of main.py: > > http://bazaar.launchpad.net/~mdipierro/web2py/devel/annotate/head%3A/gluon/main.py<http://bazaar.launchpad.net/%7Emdipierro/web2py/devel/annotate/head%3A/gluon/main.py> > > > That's my plan (I'm the one who did the main.py re.X patch). > +1 (great!; thanks!) > > > Thanks, > - Yarko > > On Fri, Aug 7, 2009 at 10:56 AM, Carl <carl.ro...@gmail.com> wrote: > >> >> You've convinced me that staying close to RFC is a "best choice" even >> though we lose the opportunity for users to correct addresses at the >> point of data entry. >> >> nb the suggested regex in my last posting doesn't work well enough! >> e.g., a...@domain.co.uk isn't matched >> >> C >> >> >> >> On Aug 7, 4:48 pm, Jonathan Lundell <jlund...@pobox.com> wrote: >> > On Aug 7, 2009, at 8:13 AM, Carl wrote: >> > >> > >> > >> > > This is an excellent article on the traps to beware of when regex'ing >> > > email address formats >> > >> > >http://www.regular-expressions.info/email.html >> > >> > > This may ignite a debate though :) >> > >> > A discussion, maybe. In the abstract, I like the idea of verifying the >> > RFC verbatim, but we *should* be clear on what we're trying to do. >> > Guard against typos? Prevent some kind of attack? How much do we care >> > about false positives? >> > >> > The article objects (to RFC-style checking) that j...@aol.com.nospam, >> > for example, will validate. I'm not too concerned about that, in that >> > there are lots of ways that a user can enter a wrong but >> > (syntactically) valid address. We deal with that through active >> > validation, not a syntax check. >> > >> > Might there be a security concern? The quoted variation of the RFC >> > checker is very permissive: >> > >> > "([^"\r\\]|\\["\r\\])*" >> > >> > Could that open the door to some kind of injection attack? Presumably >> > we sanitize it for display; how about when we actually use it to send >> > mail? Any consumer that doesn't understand quoted names could end up >> > very confused. >> > >> > I take false positives as a v. bad thing: if a user enters a real and >> > valid address, I do not want to reject it. So I don't much like the >> > explicit list of TLDs (below), on the grounds that it's bound to >> > expand, and at some point it'll break. From the Wikipedia TLD article: >> > >> > > During the 32nd International Public ICANN Meeting in Paris in 2008, >> > > ICANN started a new process of TLD naming policy to take a >> > > "significant step forward on the introduction of new generic top- >> > > level domains." This program envisions the availability of many new >> > > or already proposed domains, as well a new application and >> > > implementation process. Observers believed that the new rules could >> > > result in hundreds of new gTLDs to be registered. Proposed TLDs >> > > include music, berlin and nyc. >> > >> > I think I'd favor the RFC-style pattern without the quoted-name >> > alternation. >> > >> > One thing we could do is to give the developer an option: >> > IS_EMAIL(something or other) that lets them select one of a small >> > number of regexes. And of course the developer can always use IS_MATCH >> > if they don't like our choice of email filters. >> > >> > If we permitted a choice, I'd suggest: >> > >> > 1. default to the RFC regex, but without quoted names >> > 2. RFC including quoted names >> > 3. something like the pattern below, including the TLD filter >> (maybe) >> > >> > >> > >> > >> > >> > > I favour this variation... >> > > [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a- >> > > z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz| >> > > info|mobi|name|aero|jobs|museum)\b >> > >> > > C >> > >> > > On Aug 7, 8:25 am, Jonathan Lundell <jlund...@pobox.com> wrote: >> > >> On Aug 7, 2009, at 12:22 AM, mdipierro wrote: >> > >> > >>> I will take a patch for this. >> > >> > >> If nobody else gets to it first, I'll work up a patch over the >> > >> weekend. >> > >> > >>> Massimo >> > >> > >>> On Aug 7, 1:33 am, Jonathan Lundell <jlund...@pobox.com> wrote: >> > >>>> On Aug 6, 2009, at 9:32 PM, DenesL wrote: >> > >> > >>>>> IS_EMAIL does not follow the RFC specs for valid email addresses >> > >>>>> (seehttp://en.wikipedia.org/wiki/E-mail_address) >> > >> > >>>>> even a simple a...@b.com fails >> > >> > >>>>> it is kinda late to work on the regex now, maybe tomorrow. >> > >> > >>>> The RFC is fairly hard to validate. If that's what we really >> > >>>> want, I >> > >>>> found this one on the web that looks about right: >> > >> > >>>> ^(?!\.)("([^"\r\\]|\\["\r\\])*"|([-a-z0-9!#$%&'*+/=?^_`{|}~]|(?...@[a- >> > >>>> z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$ >> > >> > >>>> It assumes the case-insensitive flag. >> > >> > >>>>http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an- >> > >>>> email... >> > >> > >>>> Overkill? Or, what the heck? >> > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py-users" group. To post to this group, send email to web2py@googlegroups.com To unsubscribe from this group, send email to web2py+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---