On 6/6/08, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
>
>
>
>  On 6/6/08, André Warnier <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>> Mohit Anchlia wrote:
>>
>>> On 6/6/08, André Warnier <[EMAIL PROTECTED]> wrote:
>>>
>>>>
>>>>
>>>> Mohit Anchlia wrote:
>>>>
>>>> On 6/5/08, André Warnier <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>
>>>>> Mohit Anchlia wrote:
>>>>>>
>>>>>> On 6/5/08, André Warnier <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>>
>>>>>>> Mohit Anchlia wrote:
>>>>>>>>
>>>>>>>> On 6/5/08, André Warnier <[EMAIL PROTECTED]> wrote:
>>>>>>>>
>>>>>>>> Mohit Anchlia wrote:
>>>>>>>>>
>>>>>>>>>> On 6/4/08, Dragon <[EMAIL PROTECTED]> wrote:
>>>>>>>>>>
>>>>>>>>>> André Warnier wrote:
>>>>>>>>>>
>>>>>>>>>>> Mohit Anchlia wrote:
>>>>>>>>>>>
>>>>>>>>>>>> 2. Another question I had was sometimes we don't get real
>>>>>>>>>>>> physical
>>>>>>>>>>>> IP
>>>>>>>>>>>>
>>>>>>>>>>>> of
>>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>> machine but the IP of something that's in between like
>>>>>>>>>>>>> "router",
>>>>>>>>>>>>>
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> way to get the real IP so that we don't end up blocking people
>>>>>>>>>>>>>> coming
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> that "router" or "proxy"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In my opinion, you cannot.  The whole point of such routers
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> proxies
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> to make the requests look like they are coming from the
>>>>>>>>>>>>> router/proxy,
>>>>>>>>>>>>> so
>>>>>>>>>>>>> that is the sender IP address you are seeing at your server
>>>>>>>>>>>>> level,
>>>>>>>>>>>>> and
>>>>>>>>>>>>> that's it.  Your server never receives the original requester
>>>>>>>>>>>>> IP
>>>>>>>>>>>>> address.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------- End original message. ---------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are legitimate reasons for this to be done as well,
>>>>>>>>>>>>>
>>>>>>>>>>>>> indiscriminately
>>>>>>>>>>>> blocking such access is a bad idea as it will affect legitimate
>>>>>>>>>>>> users.
>>>>>>>>>>>> NAT
>>>>>>>>>>>> and IP address sharing are among the reasons. This allows an
>>>>>>>>>>>> organization
>>>>>>>>>>>> to
>>>>>>>>>>>> have a router with one public IP address to serve a larger
>>>>>>>>>>>> internal
>>>>>>>>>>>> network
>>>>>>>>>>>> with private IP addresses. Without this, we would have run out
>>>>>>>>>>>> of
>>>>>>>>>>>> IPv4
>>>>>>>>>>>> addresses a long time ago.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Dragon
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> If there is no way to get the real IP address then how would
>>>>>>>>>>>> router
>>>>>>>>>>>>
>>>>>>>>>>>> know
>>>>>>>>>>> which machine to direct the response to. It got to have some
>>>>>>>>>>> information
>>>>>>>>>>> in
>>>>>>>>>>> the packet. For eg: If A send to router B and router sends to C
>>>>>>>>>>> then
>>>>>>>>>>> when
>>>>>>>>>>> C
>>>>>>>>>>> responds how would B know that the response is for A.
>>>>>>>>>>>
>>>>>>>>>>> You are perfectly right : the router knows the real IP address.
>>>>>>>>>>>  But
>>>>>>>>>>> it
>>>>>>>>>>>
>>>>>>>>>>> will not tell you, haha.
>>>>>>>>>>>
>>>>>>>>>>> Seriously, this is how it works :
>>>>>>>>>> the original system sends out an "open session" packet, through
>>>>>>>>>> the
>>>>>>>>>> router,
>>>>>>>>>> to the final destination.
>>>>>>>>>> The router sees this packet, and analyses it.  It extracts the IP
>>>>>>>>>> address
>>>>>>>>>> and port of the original sender, and keeps it in a table.
>>>>>>>>>> Then it replaces the IP address by it's own, adds some port
>>>>>>>>>> number,
>>>>>>>>>> and
>>>>>>>>>> also memorises this new port number in the same table entry.
>>>>>>>>>> Then it sends the modified packet to the external server (yours).
>>>>>>>>>> It knows that the server on the other side is going to respond to
>>>>>>>>>> this
>>>>>>>>>> same
>>>>>>>>>> IP address and port (the ones of the router).
>>>>>>>>>> When the return packet from the server comes back, the router
>>>>>>>>>> looks
>>>>>>>>>> at
>>>>>>>>>> the
>>>>>>>>>> port in it, finds the corresponding entry in it's table, and now
>>>>>>>>>> it
>>>>>>>>>> knows
>>>>>>>>>> to
>>>>>>>>>> whom it should send the packet internally.
>>>>>>>>>> And so on.
>>>>>>>>>> So :
>>>>>>>>>> - the router knows everything
>>>>>>>>>> - the internal system thinks it is talking directly to the
>>>>>>>>>> external
>>>>>>>>>> server
>>>>>>>>>> - the external server (yours) only sees the router IP and port, so
>>>>>>>>>> it
>>>>>>>>>> thinks that is where the packet comes from.
>>>>>>>>>>
>>>>>>>>>> That's NAT for you, in a nutshell.
>>>>>>>>>>
>>>>>>>>>> Yes ?
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks for the great explanation. But, I wonder how do people
>>>>>>>>>> design
>>>>>>>>>>
>>>>>>>>>> app
>>>>>>>>> agains Denial of Service attack. Say Computer A uses Cox/Times
>>>>>>>>> warner
>>>>>>>>> (cable) Internet connection and starts attacking B, then how would
>>>>>>>>> a
>>>>>>>>> system be configured in a way that not all the users using Times
>>>>>>>>> Warner/Cox
>>>>>>>>> are affected. Should it be granular enough to give IP and source
>>>>>>>>> Port
>>>>>>>>> in
>>>>>>>>> IP
>>>>>>>>> blocking rules ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think that is quite a different case.  Not all users of an ISP
>>>>>>>>> (like
>>>>>>>>>
>>>>>>>>> the
>>>>>>>> one you mention I suppose) are "behind" a NAT router that hides
>>>>>>>> their
>>>>>>>> IP
>>>>>>>> address.  Instead, these ISP's have a large pool of public IP
>>>>>>>> addresses
>>>>>>>> which they "own", and they attribute them dynamically to users when
>>>>>>>> they
>>>>>>>> connect (and put the address back in the pool when the user
>>>>>>>> disconnects).
>>>>>>>>
>>>>>>>> If a DOS attack came from a router with a fixed IP address, and
>>>>>>>> everyone
>>>>>>>> would know that this IP address belongs to company xyz, I'm sure
>>>>>>>> that
>>>>>>>> it
>>>>>>>> would not be long before company xyz would be facing a big lawsuit.
>>>>>>>>
>>>>>>>> But in the case of an ISP, with tens of thousands of customers, each
>>>>>>>> one
>>>>>>>> of
>>>>>>>> which gets a different IP address each time he turns on his computer
>>>>>>>> (and
>>>>>>>> anyway once per 24 hours in general), finding out who exactly was "
>>>>>>>> a234d-45hjk-dialin-atlanta.cox-t-warner.net" between 17:45 and
>>>>>>>> 17:53
>>>>>>>> yesterday is a bit more time-consuming.
>>>>>>>>
>>>>>>>> But in that case anyway, you do have a real individual sender IP
>>>>>>>> address
>>>>>>>> when the packet reaches your server, so you can decide to block it.
>>>>>>>> And keep blocking all packets from this address for the next 24
>>>>>>>> hours.
>>>>>>>> And that's exactly what many servers do.
>>>>>>>> And that is also why sometimes you may turn on your PC at home
>>>>>>>> (getting
>>>>>>>> a
>>>>>>>> brand-new IP address) and find out that you cannot connect to some
>>>>>>>> server
>>>>>>>> because it is rejecting your IP address.  Chances are that you are
>>>>>>>> unlucky
>>>>>>>> enough to have received today the IP address that was used yesterday
>>>>>>>> by
>>>>>>>> someone else who used it to send out 1M emails.
>>>>>>>>
>>>>>>>> But isn't this getting a bit off-topic ?
>>>>>>>> If you want to know more about this, I suggest you Google a bit on
>>>>>>>> "blacklists", "greylists" and "whitelists" for example.
>>>>>>>> or start here : http://en.wikipedia.org/wiki/DNSBL
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>  Thanks ..it did go off-track a little bit and but it helps me
>>>>>>> understand
>>>>>>> what I should expect when doing such a blocking. Thanks for your
>>>>>>> explanation.
>>>>>>>
>>>>>>> Now coming back on track, out of below 2 approaches which one is
>>>>>>> better:
>>>>>>>
>>>>>>> 1. Use "deny from IP" in <LocationMatch>
>>>>>>> 2. Use RewriteCond and call a perl script dynamically. This helps me
>>>>>>> configure IP dynamically without having to stop and start servers
>>>>>>> everytime
>>>>>>> I change httpd.conf
>>>>>>>
>>>>>>> Is there any performance impact of using 2 over 1 or any other
>>>>>>> issues.
>>>>>>>
>>>>>>>
>>>>>>> There will be a very big difference : in case (1), the IP addresses
>>>>>>> or
>>>>>>>
>>>>>> ranges are pre-processed by Apache at startup time, and the comparison
>>>>>> will
>>>>>> be made by an internal (and fast) Apache module, on the base of
>>>>>> information
>>>>>> in memory.  In case (2), not only are you using a rewrite of the URI,
>>>>>> but
>>>>>> in
>>>>>> addition you will be executing a script, which itself is going to read
>>>>>> an
>>>>>> external file.  That is going to be several hundred times slower, at
>>>>>> least.
>>>>>>  Thousands of times slower if you recompile and execute the script
>>>>>> with
>>>>>> perl
>>>>>> each time (if not under mod_perl).
>>>>>> Now wether it matters or not in your case, depends on the load of your
>>>>>> server. If it is doing nothing anyway 90% of the time, it doesn't
>>>>>> matter.
>>>>>>  An Apache restart may or may not be such a big problem either, it all
>>>>>> depends on your circumstances.
>>>>>>
>>>>>> But rather than using a perl script, I would definitely in that case
>>>>>> use
>>>>>> a
>>>>>> mod_perl add-on module written as a PerlAccessHandler.  But that's
>>>>>> another
>>>>>> story, and one more for the mod_perl list.
>>>>>> I would bet that there exists already such a mod_perl module by the
>>>>>> way.
>>>>>> Have a look here :
>>>>>> http://cpan.uwinnipeg.ca/search?query=apache2&mode=dist
>>>>>> or, there is probably an example in the Mod_perl Cookbook
>>>>>>
>>>>>>
>>>>> As per your suggestion I looked at PerlAccessHandler, how would this
>>>>> approach be in terms of performance as compared to have "deny from IP",
>>>>> is
>>>>> it still going to be really bad.
>>>>>  <Location /URL>
>>>>>   PerlAccessHandler Example::AccessHandler
>>>>>  </Location>
>>>>> I will try running some test also.
>>>>>
>>>>>
>>>>> Well again, it all depends on your circumstances, what you want to
>>>> achieve,
>>>> how many accesses you expect, why exactly you want to block or allow
>>>> some
>>>> IPs, how many different IP's or IP ranges you would want to allow/block,
>>>> how
>>>> often they change, in function of what they change, whether it is a big
>>>> problem or not for you to do an Apache restart, how loaded your system
>>>> is
>>>> expected to be, etc..
>>>> Even if one solution looks like it is 200 times slower than another, but
>>>> your server is only loaded at 10% (happens more frequently than you
>>>> would
>>>> think), and it really makes your life easier for the next 3 years, it's
>>>> worth looking at.
>>>> And even if one solution is 200 times slower than another, that can
>>>> still
>>>> mean 0,1 millisecond, so is it important for you ?
>>>>
>>>> A simple tip :
>>>> in the Apache configuration file, you can use an "include" directive, I
>>>> believe just about anywhere, to insert at that point another bit of
>>>> configuration file.
>>>> You could have a simple text file containing all your
>>>> Deny from 1.2.3.4
>>>> Deny from 2.3.4.5
>>>> ...
>>>> lines, and include it wherever you want.
>>>> Then a simple Apache restart would re-read it.
>>>> A this file could be written and re-written by some external script
>>>> which
>>>> decides which IPs are allowed or not. Or edited with vi manually, if
>>>> that is
>>>> how often changes happen.
>>>>
>>>> If you have a PerlAccessHandler under mod_perl :
>>>> - perl itself is part of the server, so it does not have to be reloaded
>>>> each time
>>>> - the handler gets compiled once the first time it is run, and the
>>>> compiled
>>>> code is re-used afterward
>>>> - it can be smart, and only re-read the IP address list, and rebuild its
>>>> internal table when the file changes
>>>> - and in the meantime, it uses the table in memory
>>>> So in that case you would not have to restart Apache, and any changes
>>>> would
>>>> take effect immediately.
>>>>
>>>> Also, something else :
>>>> So far, you have been talking about blocking HTTP accesses at the Apache
>>>> level. But maybe you want to block more than port 80 from those IP
>>>> addresses, and maybe you should do this outside of Apache, before it
>>>> even
>>>> gets to Apache ?
>>>>
>>>> There are many solutions, but you are the one to decide which one you
>>>> implement.
>>>>
>>>
>>>
>>> Thanks. You are right we should not even let these people get to Apache.
>>> We
>>> have that process in place, but it often takes time to get that request
>>> approved and processed by Network team. Meanwhile we want something that
>>> we
>>> can block on ASAP. I am not sure how often this list will change. To
>>> begin
>>> with this list is going to be empty. Only when we experience DOS then we
>>> will update the IP.
>>>
>>> We expect to get 1000s of requests per second. Since it's going to be
>>> highly
>>> loaded server I started to think about something that would change
>>> dynamically. You mentioned the code is compiled when apache restarts,
>>> which
>>> means that if I keep list of IPs as an array inside the perl script is
>>> not
>>> going to take affect until next restart.
>>>
>>
>> The following is a bit academic, because I believe that with this kind of
>> volume you will be better off with a solution outside of Apache anyway, but
>> for the sake of argument :
>>
>> That is not exactly what I meant.  The list of IP's to block is in an
>> external file, which can change from time to time.
>> With mod_perl,
>> - the perl interpreter is "embedded" in Apache from the start.  To say it
>> another way, you have an Apache with a built-in perl compiler and run-time.
>> That means that later, to run compiled perl code, Apache does not have to
>> start an instance of the perl run-time anymore, it is already loaded and
>> ready-to-run.
>> - the perl add-on modules (the code), are also compiled (by perl) when
>> Apache starts, and the "compiled" version is in memory, ready to run. Just
>> like one of the standard C-based Apache modules like mod_mime, mod_rewrite
>> etc..
>> - however, the list of IP addresses is outside, in a file, and the perl
>> module, at start, has an empty table.
>> - the first time the module is called, it checks the table and sees that
>> it is empty.  Then it reads the file, fills the table, and notes the
>> timestamp of the file.  Then it handles the current request, to see if the
>> IP matches or not, and rejects/approves the request.
>> - the next time the module is called, it checks the table, and it is not
>> empty. It then checks the timestamp of the file.  If it has changed, it
>> reloads the table from the file, otherwise not.  Then it processes the
>> current request. (If you want to not check the file at each request, but
>> only every 30 seconds or every 10,000 requests, you can do that too.)
>> You can do this kind of thing with mod_perl in this case, because you only
>> read from the table (except when you totally reload it), and because it does
>> not matter if several Apache "children" each have their own copy if the
>> table.
>>
>> (In the above, I put "compile" between quotes, because perl compiles a
>> script into "byte-code", which is later interpreted by the run-time portion
>> of perl. But it is very fast, sometimes even faster than compiled C code.
>>  And it is very much easier, and more fun, to write an Apache add-on module
>> in perl, than in C. At least for me.)
>>
>>  Only option I think then is to read
>>
>>> the list from flat file. I just have one basic question about mod_perl.
>>> Does
>>> apache web server executes one process of perl per request ? Reason I am
>>> asking is because you mentioned I could read the list from memory, and I
>>> am
>>> not sure how would it read from memory when this script will be executed
>>> every time it tries to process the request. Because if I try to read from
>>> file then every request will try to open the file and read from it. It
>>> looks
>>> like a stateless.
>>>
>>> Thanks for detailed explanation. It does clear lot of things and also is
>>> giving me different view points. Include directive was a great tip that I
>>> wasn't aware of.
>>>
>>> But it will not work in your case, because you would need to restart
>> Apache, which will take a few seconds, during which there will be a huge
>> number of unsatisfied HTTP requests piling up.
>>
>>
>> Now, if you are really going to have 1,000's of requests/s on this server,
>> I would be very interested in writing such a mod_perl module for you, and
>> have you try it out on your server.  Just for the sake of seeing if it would
>> work.  And if it does, I'll put it in my CV.
>>
>> André
>
>
> Thanks. It doesn't look like you need to put it on your CV, people probably
> know you by you name :).
> Were you really serious ? Did you mean that the mod_perl module that you
> are proposing will read the file or provide mechanism of reading the file
> only once. Thanks a lot!!
>
>
>
I think I get it. I'll just have to make my perl script smart enough to not
read the file everytime. It was a great discussion.

Reply via email to