[web2py:11232] Re: Possible bug in IS_URL

Kyle Smith Wed, 05 Nov 2008 12:04:28 -0800

I don't believe this is entirely correct. Spaces and apostrophes
absolutely _should_ be represented in a URL encoded however section
2.2 of the spec discusses unsafe characters and their representations.


"Usually a URL has the same interpretation when an octet is
represented by a character and when it encoded. However, this is not
true for reserved characters: encoding a character reserved for a
particular scheme may change the semantics of a URL.

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL."

Typically when a character in a URL needs to be encoded it's done by
the user agent if possible, as most users have no idea what the
equivalent character encoding is or they are pasting in a URL from
some other application or document. So should http://www.test.com/my
file.pdf be rejected as an invalid URL, or should that be validated
and changed to http://www.test.com/my%20file.pdf. Automatically
encoding characters like # . _ & or + would screw up a large number of
perfectly valid URL's as it could change their meaning.

Kyle

On Wed, Nov 5, 2008 at 3:36 AM, achipa <[EMAIL PROTECTED]> wrote:
>
> Underscores, spaces and apostrophes are NOT valid (regardless of the
> part of the url they're in). As per the RFC, spaces and most other non-
> letter characters should be considered unsafe and must be encoded.
> Note that most modern browsers do some conversions transparently, so
> you can type spaces and similar in the address bar and those will get
> converted in the actual request to %20-s and such - whether you want
> to keep that convenience functionality with web2py is a different
> matter.
>
> On Nov 4, 8:25 pm, "Kyle Smith" <[EMAIL PROTECTED]> wrote:
> > For your unit test there's a few other basic things you should probably be
> > checking.
> >
> > The host portion can contain dashes
> >
> > http://my-site.com
> >
> > The path portion can contain many/most characters ex:
> >
> > http://my-site.com/path_to/my_file_for_'97.pdf
> >
> > In this example there are underscores and an apostrophe which are only valid
> > in the path/file portion of the URL.
> >
> > Kyle
> >
> > On Mon, Nov 3, 2008 at 10:49 PM, Jonathan Benn <[EMAIL PROTECTED]>wrote:
> >
> >
> >
> > > Hi Massimo,
> >
> > > If you would like some help developing a good regex, I have passable
> > > skill in this area. I just need to have a list of conforming URLs vs.
> > > non-conforming (to test against) and I can do the rest.
> >
> > > On Nov 3, 7:15 pm, mdipierro <[EMAIL PROTECTED]> wrote:
> >
> > > > fixed in trunk.
> >
> > > Thank you. Unfortunately, now it seems to be rejecting all valid
> > > cases, e.g.:
> >
> > >http://www.benn.ca
> > >http://benn.ca
> > >http://amazon.com/books/
> > >https://amazon.com/movies
> > > rstp://idontknowthisprotocol
> > > HTTP://allcaps.com
> > >http://localhost
> > >http://localhost/
> > >http://localhost/hello
> > >http://localhost/hello/
> > >http://localhost:8080
> > >http://localhost:8080/
> > >http://localhost:8080/hello
> > >http://localhost:8080/hello/
> > > file:///C:/Documents%20and%20Settings/Jonathan/Desktop/view.py
> >
> > > I wrote a unit test for IS_URL(). Since I can't seem to attach
> > > documents, I will paste it here:
> >
> > > '''
> > >    Unit tests for IS_URL()
> > > '''
> >
> > > import unittest
> > > from gluon.validators import *
> >
> > > ###############################################################################
> > > class TestIsUrl(unittest.TestCase):
> >
> > >    x = IS_URL()
> >
> > >    def testInvalidUrls(self):
> > >        urlsToCheck = ['fff',
> > >                       'htp://invalid.com',
> > >                       'http:hello.com',
> > >                       'hTTp://www.benn.ca']
> >
> > >        failures = []
> >
> > >        for url in urlsToCheck:
> > >            if self.x(url)[1] == None:
> > >                failures.append('Incorrectly accepted: ' + url)
> >
> > >        if len(failures) > 0:
> > >            self.fail(failures)
> >
> > >    def testValidUrls(self):
> >
> > >        urlsToCheck = ['http://www.benn.ca',
> > >                       'http://benn.ca',
> > >                       'http://amazon.com/books/',
> > >                       'https://amazon.com/movies',
> > >                       'rstp://idontknowthisprotocol',
> > >                       'HTTP://allcaps.com',
> > >                       'http://localhost',
> > >                       'http://localhost/',
> > >                       'http://localhost/hello',
> > >                       'http://localhost/hello/',
> > >                       'http://localhost:8080',
> > >                       'http://localhost:8080/',
> > >                       'http://localhost:8080/hello',
> > >                       'http://localhost:8080/hello/',
> > >                       'file:///C:/Documents%20and%20Settings/Jonathan/
> > > Desktop/view.py']
> >
> > >        failures = []
> >
> > >        for url in urlsToCheck:
> > >            if self.x(url)[1] != None:
> > >                failures.append('Incorrectly rejected: ' + url)
> >
> > >        if len(failures) > 0:
> > >            self.fail(failures)
> >
> > > ###############################################################################
> > > if __name__ == "__main__":
> > >    unittest.main()
> >

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py Web Framework" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:11232] Re: Possible bug in IS_URL

Reply via email to