[web2py] Re: Change in URL args handling
On Jan 27, 7:04 am, Jonathan Lundell wrote: > This applies to both the old and new URL rewrite paths, regardless of whether > there's any rewriting going on. I had another fight with the old routes.py yesterday. Is there any kind of timeline on the new router making it into stable? :))
[web2py] Re: Change in URL args handling
Correct. I am trying to close the majority of open issues are release 1.92 next week. On Jan 27, 9:21 am, Jonathan Lundell wrote: > On Jan 27, 2011, at 6:16 AM, cjrh wrote: > > > > > On Jan 27, 7:04 am, Jonathan Lundell wrote: > >> This applies to both the old and new URL rewrite paths, regardless of > >> whether there's any rewriting going on. > > > I had another fight with the old routes.py yesterday. Is there any > > kind of timeline on the new router making it into stable? :)) > > It should be in the next stable release. I'm not sure when that will be; > 1.91.6 was January 3. I'm guessing that there are enough changes that Massimo > will bump it to 1.92. > > Please make sure that the new URL router in the trunk is working for you, > though, so any problems get resolved ASAP.
[web2py] Re: Change in URL args handling
Hi Jonathan, On Jan 27, 12:04 am, Jonathan Lundell wrote: > This applies to both the old and new URL rewrite paths, regardless of whether > there's any rewriting going on. > > Previously, a trailing slash after args would cause an extra arg to be added > to the list with a value of '' (empty string). In the old logic, an embedded > empty arg was illegal. > > That is: > > /a/c/f/arg1 > > gave args as ['arg1'] > > /a/c/f/arg1/ > > gave args as ['arg1', ''] > > /a/c/f/arg1//arg2 > > was illegal. > > Now, trailing slashes are stripped, so the first two examples about give > ['arg1'], as does /a/c/f/arg1/ Maybe that should be parsed as ['arg1', '', '', '', ''] > Also, embedded empty args are legal, so the arg2 example above yields ['', > 'arg2'] Did you mean ['arg1', '', 'arg2'] ? I have not checked, but are those series of slashes legal according to an RFC?. Denes. On Jan 27, 12:04 am, Jonathan Lundell wrote: > This applies to both the old and new URL rewrite paths, regardless of whether > there's any rewriting going on. > > Previously, a trailing slash after args would cause an extra arg to be added > to the list with a value of '' (empty string). In the old logic, an embedded > empty arg was illegal. > > That is: > > /a/c/f/arg1 > > gave args as ['arg1'] > > /a/c/f/arg1/ > > gave args as ['arg1', ''] > > /a/c/f/arg1//arg2 > > was illegal. > > Now, trailing slashes are stripped, so the first two examples about give > ['arg1'], as does /a/c/f/arg1/ > > Also, embedded empty args are legal, so the arg2 example above yields ['', > 'arg2']
[web2py] Re: Change in URL args handling
> >> Now, trailing slashes are stripped, so the first two examples about give > >> ['arg1'], as does /a/c/f/arg1/ > > > Maybe that should be parsed as ['arg1', '', '', '', ''] > > Maybe, but it seems to me that it's confusing unless we also recognize a > single trailing slash as an empty arg. I don't have a strong opinion. > Note that the number of empty strings ('') is one less than the number of slashes: /arg1/ /arg1/empty/empty/empty/empty/nothing_here
[web2py] Re: Change in URL args handling
I had another problem with url argument handling that came in was "/app/controller/function/arg1\r". The default regex matching ( regex_args.match(request.raw_args( )) for the arguments would return false and then the application would respond with an "invalid request" error. It is not obvious from the error what the reason for failure is. Obviously this doesn't happen when visiting sites in a browsers, but it can be more of an issue for a json-rpc style application. Should the url be "sanitized" beforehand? On Jan 28, 12:58 am, Jonathan Lundell wrote: > On Jan 27, 2011, at 12:48 PM, DenesL wrote: > > > > Now, trailing slashes are stripped, so the first two examples about give > ['arg1'], as does /a/c/f/arg1/ > > >>> Maybe that should be parsed as ['arg1', '', '', '', ''] > > >> Maybe, but it seems to me that it's confusing unless we also recognize a > >> single trailing slash as an empty arg. I don't have a strong opinion. > > > Note that the number of empty strings ('') is one less than the number > > of slashes: > > > /arg1/ > > /arg1/empty/empty/empty/empty/nothing_here > > Yes. > > Still, it bothers me that arg1 == arg1/ != arg1//; (optional) trailing slash> seems like an odd rule. > > Notice also that '/'.join(['arg1', '']) is 'arg1/', not 'arg1//'. > > I'm open to changing it if you (or someone else) feels strongly about it--I'd > just have to change a call to rstrip to a conditional trim-one-character if > the string endswith('/'), so it's not a big deal. The important thing is for > it to be well defined.
[web2py] Re: Change in URL args handling
On Jan 27, 5:58 pm, Jonathan Lundell wrote: > On Jan 27, 2011, at 12:48 PM, DenesL wrote: > > Still, it bothers me that arg1 == arg1/ != arg1//; (optional) trailing slash> seems like an odd rule. > > Notice also that '/'.join(['arg1', '']) is 'arg1/', not 'arg1//'. True. Moreover URL('f',args=['arg1','']) is /app/ctl/f/arg1/ maybe that should be considered a bug as it should be /app/ctl/f/arg1// since URL('f',args=['arg1','','arg3'] is /app/ctl/f/arg1//arg3 which is correct.
[web2py] Re: Change in URL args handling
Hi Jonathan Stripping out trailing slashes seems like it delivers cleaner, shorter args. If no one has asked for trailing slashes, why introduce a feature which has to be protected forever as backward-compatible? After all, if these extra args exist, we're going to have to iterate through them seeing what they are before deciding whether they can be safely discarded. This leads to more lines of code; bad! On the other hand, if someone in the future comes up with a real use for these 'spurious' args, they can easily be introduced then. My vote therefore goes for #3 - in my view a vote for shorter, cleaner args and less code! -D
[web2py] Re: Change in URL args handling
On Jan 28, 10:59 am, Jonathan Lundell wrote: > > It's no so obviously a bug. Agreed. > URL('f',args=['arg1','arg2']) is /app/ctl/f/arg1/arg2 or > /app/ctl/f/'arg1'/'arg2' (quotes added for clarity) > > so suppose arg2 is ''; the consistent transformation is: > > URL('f',args=['arg1','']) -> /app/ctl/f/'arg1'/'' or /app/ctl/f/arg1/ without > the quotes > > I see three defensible choices. In all cases, .../arg1 -> ['arg1'] > > 1. .../arg1/ -> ['arg1', ''] .../arg1// -> ['arg1', '', ''] > > 2. .../arg1/ -> ['arg1'] .../arg1// -> ['arg1', ''] > > 3. .../arg1/ -> ['arg1'] .../arg1// -> ['arg1'] > > #1 follows Python split/join. Each trailing slash is another empty-string arg. > > #2 adds a trailing slash on output iff the last arg is '', and always deletes > one trailing slash on input if it's present. > > #3 ignores all trailing slashes (that's the current code) > > #1 is the most simple, consistent and elegant, it seems to me. Its drawback > is that we sort of expect that .../arg1 and .../arg1/ are the same URL. But > that's not always the case, and I don't see a big problem with saying that a > visible difference has consequences. Also, we're mostly generating our own > URLs, so we control what they look like; they're not often being typed. #1 also seems to go along with the spirit of the RFC where the slash is a segment separator. The only worry is that I think some browsers add the trailing slash if it is not there (don't quote me on this though), but does it matter?. > #2 follows the expectation that .../arg1 and .../arg1/ are the same, but it > does so at the expense of treating '' differently from all other arg values. > > #3 takes a different approach, flatly declaring that trailing slashes are > never significant. The downside is that trailing empty args ('') cannot be > made explicit in the URL (though in fairness, in current web2py they're > illegal, so...). > > We should get this resolved before Massimo releases the next stable version. > Massimo, do you have a preference? I'm leaning toward #1 at the moment, but > not very strongly. >
[web2py] Re: Change in URL args handling
> The downside is that we lose the capability to have trailing args that are > empty strings. Hi Jonathan, My point is that it's only a *downside* for those that want 'trailing args that are empty strings'. Who is it that wants them?? If we explicitly want to indicate empty args we can insert something explicit (of our own choice) to create them, eg /url~ Otherwise strip them in the interests of cleaner, shorter args. 2 cts :) -D
[web2py] Re: Change in URL args handling
> I don't think that they'll be any cleaner or shorter either way. The only way > you'll get trailing slashes (if we end up supporting them) is by asking for a > URL with empty trailing args. If you don't want trailing slashes, then don't > add empty args. Hi, I was mainly thinking of incoming urls being appended with slashes externally and that we could ignore them, but as it has been pointed out, a trailing slash already gives an empty arg, so I suppose we have to keep that. Anyhow, it sounds like your thought process has covered all the bases! :) Thanks. D
Re: [web2py] Re: Change in URL args handling
On Jan 27, 2011, at 6:16 AM, cjrh wrote: > > On Jan 27, 7:04 am, Jonathan Lundell wrote: >> This applies to both the old and new URL rewrite paths, regardless of >> whether there's any rewriting going on. > > I had another fight with the old routes.py yesterday. Is there any > kind of timeline on the new router making it into stable? :)) It should be in the next stable release. I'm not sure when that will be; 1.91.6 was January 3. I'm guessing that there are enough changes that Massimo will bump it to 1.92. Please make sure that the new URL router in the trunk is working for you, though, so any problems get resolved ASAP.
Re: [web2py] Re: Change in URL args handling
On Jan 27, 2011, at 11:54 AM, DenesL wrote: > Hi Jonathan, > > On Jan 27, 12:04 am, Jonathan Lundell wrote: >> This applies to both the old and new URL rewrite paths, regardless of >> whether there's any rewriting going on. >> >> Previously, a trailing slash after args would cause an extra arg to be added >> to the list with a value of '' (empty string). In the old logic, an embedded >> empty arg was illegal. >> >> That is: >> >> /a/c/f/arg1 >> >> gave args as ['arg1'] >> >> /a/c/f/arg1/ >> >> gave args as ['arg1', ''] >> >> /a/c/f/arg1//arg2 >> >> was illegal. >> >> Now, trailing slashes are stripped, so the first two examples about give >> ['arg1'], as does /a/c/f/arg1/ > > Maybe that should be parsed as ['arg1', '', '', '', ''] Maybe, but it seems to me that it's confusing unless we also recognize a single trailing slash as an empty arg. I don't have a strong opinion. > >> Also, embedded empty args are legal, so the arg2 example above yields ['', >> 'arg2'] > > Did you mean ['arg1', '', 'arg2'] ? Right. > > I have not checked, but are those series of slashes legal according to > an RFC?. Good question. I'll have a look. RFC 3986 (generic URI) allows segments to be empty. However, a relative URL of the form '//a/b' is supposed to interpret a as the authority (domain). That's not an issue for web2py, since we'll always have a function at least before args unless someone does a really strange rewrite, in which case they need to avoid that pattern. > > Denes. > > > On Jan 27, 12:04 am, Jonathan Lundell wrote: >> This applies to both the old and new URL rewrite paths, regardless of >> whether there's any rewriting going on. >> >> Previously, a trailing slash after args would cause an extra arg to be added >> to the list with a value of '' (empty string). In the old logic, an embedded >> empty arg was illegal. >> >> That is: >> >> /a/c/f/arg1 >> >> gave args as ['arg1'] >> >> /a/c/f/arg1/ >> >> gave args as ['arg1', ''] >> >> /a/c/f/arg1//arg2 >> >> was illegal. >> >> Now, trailing slashes are stripped, so the first two examples about give >> ['arg1'], as does /a/c/f/arg1/ >> >> Also, embedded empty args are legal, so the arg2 example above yields ['', >> 'arg2']
Re: [web2py] Re: Change in URL args handling
On Jan 27, 2011, at 12:48 PM, DenesL wrote: > Now, trailing slashes are stripped, so the first two examples about give ['arg1'], as does /a/c/f/arg1/ >> >>> Maybe that should be parsed as ['arg1', '', '', '', ''] >> >> Maybe, but it seems to me that it's confusing unless we also recognize a >> single trailing slash as an empty arg. I don't have a strong opinion. >> > > Note that the number of empty strings ('') is one less than the number > of slashes: > > /arg1/ > /arg1/empty/empty/empty/empty/nothing_here Yes. Still, it bothers me that arg1 == arg1/ != arg1//; seems like an odd rule. Notice also that '/'.join(['arg1', '']) is 'arg1/', not 'arg1//'. I'm open to changing it if you (or someone else) feels strongly about it--I'd just have to change a call to rstrip to a conditional trim-one-character if the string endswith('/'), so it's not a big deal. The important thing is for it to be well defined.
Re: [web2py] Re: Change in URL args handling
On Jan 28, 2011, at 6:05 AM, DenesL wrote: > > On Jan 27, 5:58 pm, Jonathan Lundell wrote: >> On Jan 27, 2011, at 12:48 PM, DenesL wrote: >> >> Still, it bothers me that arg1 == arg1/ != arg1//; > (optional) trailing slash> seems like an odd rule. >> >> Notice also that '/'.join(['arg1', '']) is 'arg1/', not 'arg1//'. > > True. > Moreover URL('f',args=['arg1','']) is > /app/ctl/f/arg1/ > > maybe that should be considered a bug as it should be > /app/ctl/f/arg1// It's no so obviously a bug. URL('f',args=['arg1','arg2']) is /app/ctl/f/arg1/arg2 or /app/ctl/f/'arg1'/'arg2' (quotes added for clarity) so suppose arg2 is ''; the consistent transformation is: URL('f',args=['arg1','']) -> /app/ctl/f/'arg1'/'' or /app/ctl/f/arg1/ without the quotes I see three defensible choices. In all cases, .../arg1 -> ['arg1'] 1. .../arg1/ -> ['arg1', ''].../arg1// -> ['arg1', '', ''] 2. .../arg1/ -> ['arg1'].../arg1// -> ['arg1', ''] 3. .../arg1/ -> ['arg1'].../arg1// -> ['arg1'] #1 follows Python split/join. Each trailing slash is another empty-string arg. #2 adds a trailing slash on output iff the last arg is '', and always deletes one trailing slash on input if it's present. #3 ignores all trailing slashes (that's the current code) #1 is the most simple, consistent and elegant, it seems to me. Its drawback is that we sort of expect that .../arg1 and .../arg1/ are the same URL. But that's not always the case, and I don't see a big problem with saying that a visible difference has consequences. Also, we're mostly generating our own URLs, so we control what they look like; they're not often being typed. #2 follows the expectation that .../arg1 and .../arg1/ are the same, but it does so at the expense of treating '' differently from all other arg values. #3 takes a different approach, flatly declaring that trailing slashes are never significant. The downside is that trailing empty args ('') cannot be made explicit in the URL (though in fairness, in current web2py they're illegal, so...). We should get this resolved before Massimo releases the next stable version. Massimo, do you have a preference? I'm leaning toward #1 at the moment, but not very strongly. > > since URL('f',args=['arg1','','arg3'] is > /app/ctl/f/arg1//arg3 > > which is correct.
Re: [web2py] Re: Change in URL args handling
On Jan 28, 2011, at 4:40 AM, marius.v.niekerk wrote: > > I had another problem with url argument handling that came in was > > "/app/controller/function/arg1\r". > > The default regex matching ( regex_args.match(request.raw_args( )) for > the arguments would return false and then the application would > respond with an "invalid request" error. It is not obvious from the > error what the reason for failure is. > > Obviously this doesn't happen when visiting sites in a browsers, but > it can be more of an issue for a json-rpc style application. > > Should the url be "sanitized" beforehand? If you're suggesting that trailing whitespace (including \r) should be treated as a special case, then maybe so, but it doesn't seem compelling to me: "Don't do that!" That is, yes, the URL should be sanitized, but by the generator, not the receiver. Rejecting the request gives an early warning to the developer that something is wrong with the URL generation. Otherwise I think that rejecting such a URL is the right thing to do; we really shouldn't be editing a URL (aside from standards-based decoding), since we don't know what the intent was. We could make the error message more helpful by identifying what was wrong with the request. I've done a little of that already. But that's not going to be very helpful when the request is automated.
Re: [web2py] Re: Change in URL args handling
On Jan 28, 2011, at 8:30 AM, villas wrote: > > Stripping out trailing slashes seems like it delivers cleaner, shorter > args. If no one has asked for trailing slashes, why introduce a > feature which has to be protected forever as backward-compatible? > > After all, if these extra args exist, we're going to have to iterate > through them seeing what they are before deciding whether they can be > safely discarded. This leads to more lines of code; bad! > > On the other hand, if someone in the future comes up with a real use > for these 'spurious' args, they can easily be introduced then. > > My vote therefore goes for #3 - in my view a vote for shorter, cleaner > args and less code! That was my thinking when I implemented it that way. The downside is that we lose the capability to have trailing args that are empty strings. So if you're generating a URL with request.args = ['arg1', ''], it'll be treated as though the '' arg wasn't there. Obviously that's not critical, since we can't (in general) use '' args at all right now.
Re: [web2py] Re: Change in URL args handling
On Jan 28, 2011, at 9:21 AM, DenesL wrote: > > > > On Jan 28, 10:59 am, Jonathan Lundell wrote: >> >> It's not so obviously a bug. > > Agreed. > >> URL('f',args=['arg1','arg2']) is /app/ctl/f/arg1/arg2 or >> /app/ctl/f/'arg1'/'arg2' (quotes added for clarity) >> >> so suppose arg2 is ''; the consistent transformation is: >> >> URL('f',args=['arg1','']) -> /app/ctl/f/'arg1'/'' or /app/ctl/f/arg1/ >> without the quotes >> >> I see three defensible choices. In all cases, .../arg1 -> ['arg1'] >> >> 1. .../arg1/ -> ['arg1', ''] .../arg1// -> ['arg1', '', ''] >> >> 2. .../arg1/ -> ['arg1'] .../arg1// -> ['arg1', ''] >> >> 3. .../arg1/ -> ['arg1'] .../arg1// -> ['arg1'] >> >> #1 follows Python split/join. Each trailing slash is another empty-string >> arg. >> >> #2 adds a trailing slash on output iff the last arg is '', and always >> deletes one trailing slash on input if it's present. >> >> #3 ignores all trailing slashes (that's the current code) >> >> #1 is the most simple, consistent and elegant, it seems to me. Its drawback >> is that we sort of expect that .../arg1 and .../arg1/ are the same URL. But >> that's not always the case, and I don't see a big problem with saying that a >> visible difference has consequences. Also, we're mostly generating our own >> URLs, so we control what they look like; they're not often being typed. > > #1 also seems to go along with the spirit of the RFC where the slash > is a segment separator. > The only worry is that I think some browsers add the trailing slash if > it is not there (don't quote me on this though), but does it matter?. If a browser did that, we'd see an extra '' arg, I guess, but I wonder if that actually happens. FWIW, we already see an extra '' arg when there's a trailing slash in the current web2py. It's not intentional; the URL regex parser has a little bug, and (again by accident) the logic that normally treats empty args as errors doesn't catch this one. So it must be at least tolerable. Mostly. But that's how this subject came up. Kenneth Lundström wrote: > I´m a little bit surprised that I havn´t noticed this before as I have used > len(request.args) quite much to determine the state of an operation. Though here's a question. Kenneth also wrote: Has it allways been like this or? > http://./application/controller/function/args1 len(request.args) = 1 > http://./application/controller/function/args1/args2 len(request.args) > = 2 > > but > http://./application/controller/function/args1/ len(request.args) = 2 > > I would have guessed that the last case would have returned length as 1 Kenneth: how did you happen to notice this? How did the extra trailing slash get there in the first place? > >> #2 follows the expectation that .../arg1 and .../arg1/ are the same, but it >> does so at the expense of treating '' differently from all other arg values. >> >> #3 takes a different approach, flatly declaring that trailing slashes are >> never significant. The downside is that trailing empty args ('') cannot be >> made explicit in the URL (though in fairness, in current web2py they're >> illegal, so...). >> >> We should get this resolved before Massimo releases the next stable version. >> Massimo, do you have a preference? I'm leaning toward #1 at the moment, but >> not very strongly. >>
Re: [web2py] Re: Change in URL args handling
> Kenneth: how did you happen to notice this? How did the extra trailing slash get there in the first place? I was testing an application and was using http://domain/controller/applications/2 when I decided to start from the begining with http://domain/controller/applications but I only removed the number not the slash. And as my program was checking the length of request.args and determined that length was 1. It took arg1 and tried to use it as an number and gave me Internal ticket. I could not understand what was wrong with my code until I found what I thought was as bug in web2py. I understand both sides but would still consider this a bug. Maybe bug is a bit too strong word for it. Kenneth #2 follows the expectation that .../arg1 and .../arg1/ are the same, but it does so at the expense of treating '' differently from all other arg values. #3 takes a different approach, flatly declaring that trailing slashes are never significant. The downside is that trailing empty args ('') cannot be made explicit in the URL (though in fairness, in current web2py they're illegal, so...). We should get this resolved before Massimo releases the next stable version. Massimo, do you have a preference? I'm leaning toward #1 at the moment, but not very strongly.
Re: [web2py] Re: Change in URL args handling
"If no one has asked for trailing slashes, why introduce a feature which has to be protected forever as backward-compatible?" Except that stripping out the trailing slash is itself a non-backwards compatible change. It may be OK, but let's be clear. I don't think the browser adds the trailing slash. I believe the webserver redirects to the same URL with slash post-pended if it is unable to find something to handle the URL without the trailing slash. Does anyone know how other frameworks handle this (Rails & Django)? We should also make sure we are consistent with any RFCs.
Re: [web2py] Re: Change in URL args handling
On Jan 28, 2011, at 10:40 AM, villas wrote: > >> The downside is that we lose the capability to have trailing args that are >> empty strings. > > Hi Jonathan, > My point is that it's only a *downside* for those that want 'trailing > args that are empty strings'. > Who is it that wants them?? We don't know. The problem with args is that they can be used for pretty much anything, and it's entirely application-dependent. So if they're being used for something where elements can be empty, you'll get empty trailing args. Just because *my* application has no use for them doesn't mean that someone else's might not. > If we explicitly want to indicate empty args we can insert something > explicit (of our own choice) to create them, eg /url~ > Otherwise strip them in the interests of cleaner, shorter args. > 2 cts :) I don't think that they'll be any cleaner or shorter either way. The only way you'll get trailing slashes (if we end up supporting them) is by asking for a URL with empty trailing args. If you don't want trailing slashes, then don't add empty args. That is, args=['arg1'] will generate a URL of .../arg1 regardless of which choice we make here.
Re: [web2py] Re: Change in URL args handling
On Jan 30, 2011, at 10:29 AM, villas wrote: > >> I don't think that they'll be any cleaner or shorter either way. The only >> way you'll get trailing slashes (if we end up supporting them) is by asking >> for a URL with empty trailing args. If you don't want trailing slashes, then >> don't add empty args. > > Hi, I was mainly thinking of incoming urls being appended with slashes > externally and that we could ignore them, but as it has been pointed > out, a trailing slash already gives an empty arg, so I suppose we have > to keep that. Anyhow, it sounds like your thought process has covered > all the bases! :) As it stands right now (in the trunk), we're not keeping the empty arg with a trailing slash, mainly because this appears to be a bug in the current implementation (which generally treats empty args as an error).