Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-23 Thread jc
The fix certainly works for me. And thanks, it's taken me a while to 
understand what Jonathan has been trying to tell me about  changing the 
regex in routes.py. In fact it could hardly be simpler. However I think the 
revised regex is a much better default.

On Wednesday, 21 November 2012 16:23:54 UTC, Jonathan Lundell wrote:
>
> On 21 Nov 2012, at 5:59 AM, Massimo Di Pierro 
> > 
> wrote: 
> > I will take a patch to fix this. 
> > 
> > On Tuesday, 20 November 2012 07:00:37 UTC-6, jc wrote: 
> > You are correct of course, but to quote the book: 
> > 
> > "web2py includes two distinct URL rewrite systems: an easy-to-use 
> parameter-based system for most use cases, and a flexible pattern-based 
> system for more complex cases." 
> > 
> > You have to use the pattern based system to avoid the vulnerability, and 
> I bet most people don't. 
> > 
> > Anyway, thanks for your work-around. Prompted by Jonathan I will look 
> into using the pattern based system and remove the temporary fix. 
> > 
> > 
>
> I may have a solution. 
>
> Try replacing this: r'([\w@ -]+[=.]?)*$' 
>
> with this: r'([\w@ -]|(?<=[\w@ -])[.=])*$' 
>
> You can do this by using the args_match override in routes.py. (I notice 
> that the documented default for args_match in router.example.py is wrong; 
> that will need to be corrected as well.) 
>
> file_match probably needs a similar fix.

-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-21 Thread Jonathan Lundell
On 21 Nov 2012, at 5:59 AM, Massimo Di Pierro  
wrote:
> I will take a patch to fix this. 
> 
> On Tuesday, 20 November 2012 07:00:37 UTC-6, jc wrote:
> You are correct of course, but to quote the book:
> 
> "web2py includes two distinct URL rewrite systems: an easy-to-use 
> parameter-based system for most use cases, and a flexible pattern-based 
> system for more complex cases."
> 
> You have to use the pattern based system to avoid the vulnerability, and I 
> bet most people don't.
> 
> Anyway, thanks for your work-around. Prompted by Jonathan I will look into 
> using the pattern based system and remove the temporary fix.
> 
> 

I may have a solution.

Try replacing this: r'([\w@ -]+[=.]?)*$'

with this: r'([\w@ -]|(?<=[\w@ -])[.=])*$'

You can do this by using the args_match override in routes.py. (I notice that 
the documented default for args_match in router.example.py is wrong; that will 
need to be corrected as well.)

file_match probably needs a similar fix.

-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-21 Thread Massimo Di Pierro
I will take a patch to fix this. 

On Tuesday, 20 November 2012 07:00:37 UTC-6, jc wrote:
>
> You are correct of course, but to quote the book:
>
> "web2py includes two distinct URL rewrite systems: an easy-to-use 
> parameter-based system for most use cases, and a flexible pattern-based 
> system for more complex cases."
>
> You have to use the pattern based system to avoid the vulnerability, and I 
> bet most people don't.
>
> Anyway, thanks for your work-around. Prompted by Jonathan I will look into 
> using the pattern based system and remove the temporary fix.
>

-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-20 Thread jc
You are correct of course, but to quote the book:

"web2py includes two distinct URL rewrite systems: an easy-to-use 
parameter-based system for most use cases, and a flexible pattern-based 
system for more complex cases."

You have to use the pattern based system to avoid the vulnerability, and I 
bet most people don't.

Anyway, thanks for your work-around. Prompted by Jonathan I will look into 
using the pattern based system and remove the temporary fix.

-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-19 Thread Jonathan Lundell
On 19 Nov 2012, at 12:33 PM, Niphlod  wrote:
> it's just only for those who use the parametric router, not for all the 
> web2py installations out there.

You can relax the pattern in routes.py, too.


> 
> Il giorno lunedì 19 novembre 2012 17:17:54 UTC+1, jc ha scritto:
> I have been thinking a little about this. Niphlod's suggestion solves the 
> problem for me at the moment, but isn't there an enormous problem? It seems 
> that any web2py installation can be taken down accidentally or maliciously 
> just by somebody requesting an invalid argument string in the url of the form 
> 'xxX' where the 'x's are valid characters and there are enough of them, 
> and the 'X' is invalid? There must be a lot of vulnerable sites out there.
> 
> It seems to me there is one easy fix which is to just strip out invalid 
> characters before the regex match. You will get collisions, but since the url 
> is invalid anyway, who cares? Or the string could be urlencoded first so that 
> the invalid characters become % encoded?
> 
> 
> On Tuesday, November 13, 2012 7:33:26 PM UTC, Jonathan Lundell wrote:
> On 13 Nov 2012, at 11:20 AM, Niphlod  wrote:
>> I'm definitely not a regex master, but what's the [=.]? part required for ?
> 
> The idea (not mine, fwiw) is that you can have multiple strings of [\w@ -]+ 
> separated or ended (but not begun) with a single . or = (but not multiple 
> ones). My workaround would allow leading or multiple . or =. I think we 
> probably should anyway, since we should be assuming that args are necessarily 
> a file path, which seems to be what's going on there.
> 
> It's trying to prevent stuff like foo/../../../bar.
> 
>> 
>> On Tuesday, November 13, 2012 7:00:32 PM UTC+1, Jonathan Lundell wrote:
>> On 13 Nov 2012, at 9:04 AM, Niphlod  wrote:
>>> seems a problem with the default regex checking for args Let's wait for 
>>> Jonathan
>>> 
>>> >>> import re
>>> >>> mymatch = re.compile(r'([\w@ -]+[=.]?)*$')
>>> >>> mymatch.match('a')
>>> <_sre.SRE_Match object at 0x02A61020>
>>> >>> mymatch.match('A Lccc - Pddd GA Dee (  A).pdf')
>>> 
>>> endless loop of backtracing regex
>> 
>> I don't have a quick fix. The easy solutions involve re elements not 
>> available in Python re (or at least not until 3.1).
>> 
>> A workaround would be to make the pattern a little more lenient: [\w@ -=.]+
>> 
>> If we really want to exclude successive dots or equals, we could make a 
>> separate check for that.
>> 
>> 
> 
> 
> 
> -- 
>  
>  
>  


-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-19 Thread Niphlod
it's just only for those who use the parametric router, not for all the 
web2py installations out there.

Il giorno lunedì 19 novembre 2012 17:17:54 UTC+1, jc ha scritto:
>
> I have been thinking a little about this. Niphlod's suggestion solves the 
> problem for me at the moment, but isn't there an enormous problem? It seems 
> that any web2py installation can be taken down accidentally or maliciously 
> just by somebody requesting an invalid argument string in the url of the 
> form 'xxX' where the 'x's are valid characters and there are enough of 
> them, and the 'X' is invalid? There must be a lot of vulnerable sites out 
> there.
>
> It seems to me there is one easy fix which is to just strip out invalid 
> characters before the regex match. You will get collisions, but since the 
> url is invalid anyway, who cares? Or the string could be urlencoded first 
> so that the invalid characters become % encoded?
>
>
> On Tuesday, November 13, 2012 7:33:26 PM UTC, Jonathan Lundell wrote:
>>
>> On 13 Nov 2012, at 11:20 AM, Niphlod  wrote:
>>
>> I'm definitely not a regex master, but what's the *[=.]?* part required 
>> for ?
>>
>>
>> The idea (not mine, fwiw) is that you can have multiple strings of [\w@ 
>> -]+ separated or ended (but not begun) with a single . or = (but not 
>> multiple ones). My workaround would allow leading or multiple . or =. I 
>> think we probably should anyway, since we should be assuming that args are 
>> necessarily a file path, which seems to be what's going on there.
>>
>> It's trying to prevent stuff like foo/../../../bar.
>>
>>
>> On Tuesday, November 13, 2012 7:00:32 PM UTC+1, Jonathan Lundell wrote:
>>>
>>> On 13 Nov 2012, at 9:04 AM, Niphlod  wrote:
>>>
>>> seems a problem with the default regex checking for args Let's wait 
>>> for Jonathan
>>>
>>> >>> import re
>>> >>> mymatch = re.compile(r'([\w@ -]+[=.]?)*$')
>>> >>> mymatch.match('a')
>>> <_sre.SRE_Match object at 0x02A61020>
>>> >>> mymatch.match('A Lccc - Pddd GA Dee (  
>>> A).pdf')
>>>
>>> endless loop of backtracing regex
>>>
>>>
>>> I don't have a quick fix. The easy solutions involve re elements not 
>>> available in Python re (or at least not until 3.1).
>>>
>>> A workaround would be to make the pattern a little more lenient: [\w@ 
>>> -=.]+
>>>
>>> If we really want to exclude successive dots or equals, we could make a 
>>> separate check for that.
>>>
>>
>>
>>
>>
>>

-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-19 Thread jc
I have been thinking a little about this. Niphlod's suggestion solves the 
problem for me at the moment, but isn't there an enormous problem? It seems 
that any web2py installation can be taken down accidentally or maliciously 
just by somebody requesting an invalid argument string in the url of the 
form 'xxX' where the 'x's are valid characters and there are enough of 
them, and the 'X' is invalid? There must be a lot of vulnerable sites out 
there.

It seems to me there is one easy fix which is to just strip out invalid 
characters before the regex match. You will get collisions, but since the 
url is invalid anyway, who cares? Or the string could be urlencoded first 
so that the invalid characters become % encoded?


On Tuesday, November 13, 2012 7:33:26 PM UTC, Jonathan Lundell wrote:
>
> On 13 Nov 2012, at 11:20 AM, Niphlod > 
> wrote:
>
> I'm definitely not a regex master, but what's the *[=.]?* part required 
> for ?
>
>
> The idea (not mine, fwiw) is that you can have multiple strings of [\w@ 
> -]+ separated or ended (but not begun) with a single . or = (but not 
> multiple ones). My workaround would allow leading or multiple . or =. I 
> think we probably should anyway, since we should be assuming that args are 
> necessarily a file path, which seems to be what's going on there.
>
> It's trying to prevent stuff like foo/../../../bar.
>
>
> On Tuesday, November 13, 2012 7:00:32 PM UTC+1, Jonathan Lundell wrote:
>>
>> On 13 Nov 2012, at 9:04 AM, Niphlod  wrote:
>>
>> seems a problem with the default regex checking for args Let's wait 
>> for Jonathan
>>
>> >>> import re
>> >>> mymatch = re.compile(r'([\w@ -]+[=.]?)*$')
>> >>> mymatch.match('a')
>> <_sre.SRE_Match object at 0x02A61020>
>> >>> mymatch.match('A Lccc - Pddd GA Dee (  
>> A).pdf')
>>
>> endless loop of backtracing regex
>>
>>
>> I don't have a quick fix. The easy solutions involve re elements not 
>> available in Python re (or at least not until 3.1).
>>
>> A workaround would be to make the pattern a little more lenient: [\w@ 
>> -=.]+
>>
>> If we really want to exclude successive dots or equals, we could make a 
>> separate check for that.
>>
>
>
>
>
>

-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-13 Thread jc
Thanks for the suggestion. I have made the Niphlod's change to the regex 
without really understanding the implications. It seems to have fixed my 
immediate problem, so at least I am back up. I will try to understand what 
it is all about later on.

-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-13 Thread Niphlod
got that. Here's the problem though is that '(' and ')' are not included in 
[\w@ -], so the regex goes into backtrack mode..

([\w@ -\(\)]+[=.]?)*$ seems to solve the particular issue jc is facing.

are we missing any other fancy characters ?


On Tuesday, November 13, 2012 8:33:26 PM UTC+1, Jonathan Lundell wrote:
>
> On 13 Nov 2012, at 11:20 AM, Niphlod > 
> wrote:
>
> I'm definitely not a regex master, but what's the *[=.]?* part required 
> for ?
>
>
> The idea (not mine, fwiw) is that you can have multiple strings of [\w@ 
> -]+ separated or ended (but not begun) with a single . or = (but not 
> multiple ones). My workaround would allow leading or multiple . or =. I 
> think we probably should anyway, since we should be assuming that args are 
> necessarily a file path, which seems to be what's going on there.
>
> It's trying to prevent stuff like foo/../../../bar.
>
>
> On Tuesday, November 13, 2012 7:00:32 PM UTC+1, Jonathan Lundell wrote:
>>
>> On 13 Nov 2012, at 9:04 AM, Niphlod  wrote:
>>
>> seems a problem with the default regex checking for args Let's wait 
>> for Jonathan
>>
>> >>> import re
>> >>> mymatch = re.compile(r'([\w@ -]+[=.]?)*$')
>> >>> mymatch.match('a')
>> <_sre.SRE_Match object at 0x02A61020>
>> >>> mymatch.match('A Lccc - Pddd GA Dee (  
>> A).pdf')
>>
>> endless loop of backtracing regex
>>
>>
>> I don't have a quick fix. The easy solutions involve re elements not 
>> available in Python re (or at least not until 3.1).
>>
>> A workaround would be to make the pattern a little more lenient: [\w@ 
>> -=.]+
>>
>> If we really want to exclude successive dots or equals, we could make a 
>> separate check for that.
>>
>
>
>
>
>

-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-13 Thread Jonathan Lundell
On 13 Nov 2012, at 11:20 AM, Niphlod  wrote:
> I'm definitely not a regex master, but what's the [=.]? part required for ?

The idea (not mine, fwiw) is that you can have multiple strings of [\w@ -]+ 
separated or ended (but not begun) with a single . or = (but not multiple 
ones). My workaround would allow leading or multiple . or =. I think we 
probably should anyway, since we should be assuming that args are necessarily a 
file path, which seems to be what's going on there.

It's trying to prevent stuff like foo/../../../bar.

> 
> On Tuesday, November 13, 2012 7:00:32 PM UTC+1, Jonathan Lundell wrote:
> On 13 Nov 2012, at 9:04 AM, Niphlod  wrote:
>> seems a problem with the default regex checking for args Let's wait for 
>> Jonathan
>> 
>> >>> import re
>> >>> mymatch = re.compile(r'([\w@ -]+[=.]?)*$')
>> >>> mymatch.match('a')
>> <_sre.SRE_Match object at 0x02A61020>
>> >>> mymatch.match('A Lccc - Pddd GA Dee (  A).pdf')
>> 
>> endless loop of backtracing regex
> 
> I don't have a quick fix. The easy solutions involve re elements not 
> available in Python re (or at least not until 3.1).
> 
> A workaround would be to make the pattern a little more lenient: [\w@ -=.]+
> 
> If we really want to exclude successive dots or equals, we could make a 
> separate check for that.
> 
> 


-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-13 Thread Niphlod
I'm definitely not a regex master, but what's the *[=.]?* part required for 
?

On Tuesday, November 13, 2012 7:00:32 PM UTC+1, Jonathan Lundell wrote:
>
> On 13 Nov 2012, at 9:04 AM, Niphlod > 
> wrote:
>
> seems a problem with the default regex checking for args Let's wait 
> for Jonathan
>
> >>> import re
> >>> mymatch = re.compile(r'([\w@ -]+[=.]?)*$')
> >>> mymatch.match('a')
> <_sre.SRE_Match object at 0x02A61020>
> >>> mymatch.match('A Lccc - Pddd GA Dee (  A).pdf')
>
> endless loop of backtracing regex
>
>
> I don't have a quick fix. The easy solutions involve re elements not 
> available in Python re (or at least not until 3.1).
>
> A workaround would be to make the pattern a little more lenient: [\w@ -=.]+
>
> If we really want to exclude successive dots or equals, we could make a 
> separate check for that.
>

-- 





Re: [web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-13 Thread Jonathan Lundell
On 13 Nov 2012, at 9:04 AM, Niphlod  wrote:
> seems a problem with the default regex checking for args Let's wait for 
> Jonathan
> 
> >>> import re
> >>> mymatch = re.compile(r'([\w@ -]+[=.]?)*$')
> >>> mymatch.match('a')
> <_sre.SRE_Match object at 0x02A61020>
> >>> mymatch.match('A Lccc - Pddd GA Dee (  A).pdf')
> 
> endless loop of backtracing regex

I don't have a quick fix. The easy solutions involve re elements not available 
in Python re (or at least not until 3.1).

A workaround would be to make the pattern a little more lenient: [\w@ -=.]+

If we really want to exclude successive dots or equals, we could make a 
separate check for that.

-- 





[web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-13 Thread Niphlod
seems a problem with the default regex checking for args Let's wait for 
Jonathan

>>> import re
>>> mymatch = re.compile(r'([\w@ -]+[=.]?)*$')
>>> mymatch.match('a')
<_sre.SRE_Match object at 0x02A61020>
>>> mymatch.match('A Lccc - Pddd GA Dee (  A).pdf')

endless loop of backtracing regex

-- 





[web2py] Re: Bug? Invalid url puts python into a tight loop - 100% CPU

2012-11-13 Thread jc
I can confirm this is a bug. Steps to recreate:

1) Unzip a copy of web2py_src_2.2.1.zip into a fresh directory

2) Start web2py e.g. cd web2py; python web2py.py -a test

3) Use the appadmin interface to create a new simple app e.g. myapp

4) Create a routes.py file containing

routers = dict(
BASE  = dict(
default_application = 'myapp',
),
)

5) In a browser (I used Chrome) visit the invalid url 
"http://localhost:8000/myapp/default/aa/A%20Lccc%20-%20Pddd%20GA%20Dee%20(%20%20A).pdf"

Result is python is now in a tight processor loop.

--