On 6/6/08, André Warnier <[EMAIL PROTECTED]> wrote:
>
>
>
> Mohit Anchlia wrote:
>
>> On 6/6/08, André Warnier <[EMAIL PROTECTED]> wrote:
>>
>>>
>>>
>>> Mohit Anchlia wrote:
>>>
>>> On 6/5/08, André Warnier <[EMAIL PROTECTED]> wrote:
>>>>
>>>>
>>>> Mohit Anchlia wrote:
>>>>>
>>>>> On 6/5/08, André Warnier <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>>
>>>>>> Mohit Anchlia wrote:
>>>>>>>
>>>>>>> On 6/5/08, André Warnier <[EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>> Mohit Anchlia wrote:
>>>>>>>>
>>>>>>>>> On 6/4/08, Dragon <[EMAIL PROTECTED]> wrote:
>>>>>>>>>
>>>>>>>>> André Warnier wrote:
>>>>>>>>>
>>>>>>>>>> Mohit Anchlia wrote:
>>>>>>>>>>
>>>>>>>>>>> 2. Another question I had was sometimes we don't get real
>>>>>>>>>>> physical
>>>>>>>>>>> IP
>>>>>>>>>>>
>>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>> machine but the IP of something that's in between like "router",
>>>>>>>>>>>>
>>>>>>>>>>>>> is
>>>>>>>>>>>>> there
>>>>>>>>>>>>> a
>>>>>>>>>>>>> way to get the real IP so that we don't end up blocking people
>>>>>>>>>>>>> coming
>>>>>>>>>>>>> from
>>>>>>>>>>>>> that "router" or "proxy"
>>>>>>>>>>>>>
>>>>>>>>>>>>> In my opinion, you cannot.  The whole point of such routers and
>>>>>>>>>>>>> proxies
>>>>>>>>>>>>>
>>>>>>>>>>>>> is
>>>>>>>>>>>>>
>>>>>>>>>>>>> to make the requests look like they are coming from the
>>>>>>>>>>>> router/proxy,
>>>>>>>>>>>> so
>>>>>>>>>>>> that is the sender IP address you are seeing at your server
>>>>>>>>>>>> level,
>>>>>>>>>>>> and
>>>>>>>>>>>> that's it.  Your server never receives the original requester IP
>>>>>>>>>>>> address.
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------- End original message. ---------------------
>>>>>>>>>>>>
>>>>>>>>>>>> There are legitimate reasons for this to be done as well,
>>>>>>>>>>>>
>>>>>>>>>>>> indiscriminately
>>>>>>>>>>> blocking such access is a bad idea as it will affect legitimate
>>>>>>>>>>> users.
>>>>>>>>>>> NAT
>>>>>>>>>>> and IP address sharing are among the reasons. This allows an
>>>>>>>>>>> organization
>>>>>>>>>>> to
>>>>>>>>>>> have a router with one public IP address to serve a larger
>>>>>>>>>>> internal
>>>>>>>>>>> network
>>>>>>>>>>> with private IP addresses. Without this, we would have run out of
>>>>>>>>>>> IPv4
>>>>>>>>>>> addresses a long time ago.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dragon
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If there is no way to get the real IP address then how would
>>>>>>>>>>> router
>>>>>>>>>>>
>>>>>>>>>>> know
>>>>>>>>>> which machine to direct the response to. It got to have some
>>>>>>>>>> information
>>>>>>>>>> in
>>>>>>>>>> the packet. For eg: If A send to router B and router sends to C
>>>>>>>>>> then
>>>>>>>>>> when
>>>>>>>>>> C
>>>>>>>>>> responds how would B know that the response is for A.
>>>>>>>>>>
>>>>>>>>>> You are perfectly right : the router knows the real IP address.
>>>>>>>>>>  But
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>> will not tell you, haha.
>>>>>>>>>>
>>>>>>>>>> Seriously, this is how it works :
>>>>>>>>> the original system sends out an "open session" packet, through the
>>>>>>>>> router,
>>>>>>>>> to the final destination.
>>>>>>>>> The router sees this packet, and analyses it.  It extracts the IP
>>>>>>>>> address
>>>>>>>>> and port of the original sender, and keeps it in a table.
>>>>>>>>> Then it replaces the IP address by it's own, adds some port number,
>>>>>>>>> and
>>>>>>>>> also memorises this new port number in the same table entry.
>>>>>>>>> Then it sends the modified packet to the external server (yours).
>>>>>>>>> It knows that the server on the other side is going to respond to
>>>>>>>>> this
>>>>>>>>> same
>>>>>>>>> IP address and port (the ones of the router).
>>>>>>>>> When the return packet from the server comes back, the router looks
>>>>>>>>> at
>>>>>>>>> the
>>>>>>>>> port in it, finds the corresponding entry in it's table, and now it
>>>>>>>>> knows
>>>>>>>>> to
>>>>>>>>> whom it should send the packet internally.
>>>>>>>>> And so on.
>>>>>>>>> So :
>>>>>>>>> - the router knows everything
>>>>>>>>> - the internal system thinks it is talking directly to the external
>>>>>>>>> server
>>>>>>>>> - the external server (yours) only sees the router IP and port, so
>>>>>>>>> it
>>>>>>>>> thinks that is where the packet comes from.
>>>>>>>>>
>>>>>>>>> That's NAT for you, in a nutshell.
>>>>>>>>>
>>>>>>>>> Yes ?
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for the great explanation. But, I wonder how do people
>>>>>>>>> design
>>>>>>>>>
>>>>>>>>> app
>>>>>>>> agains Denial of Service attack. Say Computer A uses Cox/Times
>>>>>>>> warner
>>>>>>>> (cable) Internet connection and starts attacking B, then how would a
>>>>>>>> system be configured in a way that not all the users using Times
>>>>>>>> Warner/Cox
>>>>>>>> are affected. Should it be granular enough to give IP and source
>>>>>>>> Port
>>>>>>>> in
>>>>>>>> IP
>>>>>>>> blocking rules ?
>>>>>>>>
>>>>>>>>
>>>>>>>> I think that is quite a different case.  Not all users of an ISP
>>>>>>>> (like
>>>>>>>>
>>>>>>>> the
>>>>>>> one you mention I suppose) are "behind" a NAT router that hides their
>>>>>>> IP
>>>>>>> address.  Instead, these ISP's have a large pool of public IP
>>>>>>> addresses
>>>>>>> which they "own", and they attribute them dynamically to users when
>>>>>>> they
>>>>>>> connect (and put the address back in the pool when the user
>>>>>>> disconnects).
>>>>>>>
>>>>>>> If a DOS attack came from a router with a fixed IP address, and
>>>>>>> everyone
>>>>>>> would know that this IP address belongs to company xyz, I'm sure that
>>>>>>> it
>>>>>>> would not be long before company xyz would be facing a big lawsuit.
>>>>>>>
>>>>>>> But in the case of an ISP, with tens of thousands of customers, each
>>>>>>> one
>>>>>>> of
>>>>>>> which gets a different IP address each time he turns on his computer
>>>>>>> (and
>>>>>>> anyway once per 24 hours in general), finding out who exactly was "
>>>>>>> a234d-45hjk-dialin-atlanta.cox-t-warner.net" between 17:45 and 17:53
>>>>>>> yesterday is a bit more time-consuming.
>>>>>>>
>>>>>>> But in that case anyway, you do have a real individual sender IP
>>>>>>> address
>>>>>>> when the packet reaches your server, so you can decide to block it.
>>>>>>> And keep blocking all packets from this address for the next 24
>>>>>>> hours.
>>>>>>> And that's exactly what many servers do.
>>>>>>> And that is also why sometimes you may turn on your PC at home
>>>>>>> (getting
>>>>>>> a
>>>>>>> brand-new IP address) and find out that you cannot connect to some
>>>>>>> server
>>>>>>> because it is rejecting your IP address.  Chances are that you are
>>>>>>> unlucky
>>>>>>> enough to have received today the IP address that was used yesterday
>>>>>>> by
>>>>>>> someone else who used it to send out 1M emails.
>>>>>>>
>>>>>>> But isn't this getting a bit off-topic ?
>>>>>>> If you want to know more about this, I suggest you Google a bit on
>>>>>>> "blacklists", "greylists" and "whitelists" for example.
>>>>>>> or start here : http://en.wikipedia.org/wiki/DNSBL
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>  Thanks ..it did go off-track a little bit and but it helps me
>>>>>> understand
>>>>>> what I should expect when doing such a blocking. Thanks for your
>>>>>> explanation.
>>>>>>
>>>>>> Now coming back on track, out of below 2 approaches which one is
>>>>>> better:
>>>>>>
>>>>>> 1. Use "deny from IP" in <LocationMatch>
>>>>>> 2. Use RewriteCond and call a perl script dynamically. This helps me
>>>>>> configure IP dynamically without having to stop and start servers
>>>>>> everytime
>>>>>> I change httpd.conf
>>>>>>
>>>>>> Is there any performance impact of using 2 over 1 or any other issues.
>>>>>>
>>>>>>
>>>>>> There will be a very big difference : in case (1), the IP addresses or
>>>>>>
>>>>> ranges are pre-processed by Apache at startup time, and the comparison
>>>>> will
>>>>> be made by an internal (and fast) Apache module, on the base of
>>>>> information
>>>>> in memory.  In case (2), not only are you using a rewrite of the URI,
>>>>> but
>>>>> in
>>>>> addition you will be executing a script, which itself is going to read
>>>>> an
>>>>> external file.  That is going to be several hundred times slower, at
>>>>> least.
>>>>>  Thousands of times slower if you recompile and execute the script with
>>>>> perl
>>>>> each time (if not under mod_perl).
>>>>> Now wether it matters or not in your case, depends on the load of your
>>>>> server. If it is doing nothing anyway 90% of the time, it doesn't
>>>>> matter.
>>>>>  An Apache restart may or may not be such a big problem either, it all
>>>>> depends on your circumstances.
>>>>>
>>>>> But rather than using a perl script, I would definitely in that case
>>>>> use
>>>>> a
>>>>> mod_perl add-on module written as a PerlAccessHandler.  But that's
>>>>> another
>>>>> story, and one more for the mod_perl list.
>>>>> I would bet that there exists already such a mod_perl module by the
>>>>> way.
>>>>> Have a look here :
>>>>> http://cpan.uwinnipeg.ca/search?query=apache2&mode=dist
>>>>> or, there is probably an example in the Mod_perl Cookbook
>>>>>
>>>>>
>>>> As per your suggestion I looked at PerlAccessHandler, how would this
>>>> approach be in terms of performance as compared to have "deny from IP",
>>>> is
>>>> it still going to be really bad.
>>>>  <Location /URL>
>>>>   PerlAccessHandler Example::AccessHandler
>>>>  </Location>
>>>> I will try running some test also.
>>>>
>>>>
>>>> Well again, it all depends on your circumstances, what you want to
>>> achieve,
>>> how many accesses you expect, why exactly you want to block or allow some
>>> IPs, how many different IP's or IP ranges you would want to allow/block,
>>> how
>>> often they change, in function of what they change, whether it is a big
>>> problem or not for you to do an Apache restart, how loaded your system is
>>> expected to be, etc..
>>> Even if one solution looks like it is 200 times slower than another, but
>>> your server is only loaded at 10% (happens more frequently than you would
>>> think), and it really makes your life easier for the next 3 years, it's
>>> worth looking at.
>>> And even if one solution is 200 times slower than another, that can still
>>> mean 0,1 millisecond, so is it important for you ?
>>>
>>> A simple tip :
>>> in the Apache configuration file, you can use an "include" directive, I
>>> believe just about anywhere, to insert at that point another bit of
>>> configuration file.
>>> You could have a simple text file containing all your
>>> Deny from 1.2.3.4
>>> Deny from 2.3.4.5
>>> ...
>>> lines, and include it wherever you want.
>>> Then a simple Apache restart would re-read it.
>>> A this file could be written and re-written by some external script which
>>> decides which IPs are allowed or not. Or edited with vi manually, if that
>>> is
>>> how often changes happen.
>>>
>>> If you have a PerlAccessHandler under mod_perl :
>>> - perl itself is part of the server, so it does not have to be reloaded
>>> each time
>>> - the handler gets compiled once the first time it is run, and the
>>> compiled
>>> code is re-used afterward
>>> - it can be smart, and only re-read the IP address list, and rebuild its
>>> internal table when the file changes
>>> - and in the meantime, it uses the table in memory
>>> So in that case you would not have to restart Apache, and any changes
>>> would
>>> take effect immediately.
>>>
>>> Also, something else :
>>> So far, you have been talking about blocking HTTP accesses at the Apache
>>> level. But maybe you want to block more than port 80 from those IP
>>> addresses, and maybe you should do this outside of Apache, before it even
>>> gets to Apache ?
>>>
>>> There are many solutions, but you are the one to decide which one you
>>> implement.
>>>
>>
>>
>> Thanks. You are right we should not even let these people get to Apache.
>> We
>> have that process in place, but it often takes time to get that request
>> approved and processed by Network team. Meanwhile we want something that
>> we
>> can block on ASAP. I am not sure how often this list will change. To begin
>> with this list is going to be empty. Only when we experience DOS then we
>> will update the IP.
>>
>> We expect to get 1000s of requests per second. Since it's going to be
>> highly
>> loaded server I started to think about something that would change
>> dynamically. You mentioned the code is compiled when apache restarts,
>> which
>> means that if I keep list of IPs as an array inside the perl script is not
>> going to take affect until next restart.
>>
>
> The following is a bit academic, because I believe that with this kind of
> volume you will be better off with a solution outside of Apache anyway, but
> for the sake of argument :
>
> That is not exactly what I meant.  The list of IP's to block is in an
> external file, which can change from time to time.
> With mod_perl,
> - the perl interpreter is "embedded" in Apache from the start.  To say it
> another way, you have an Apache with a built-in perl compiler and run-time.
> That means that later, to run compiled perl code, Apache does not have to
> start an instance of the perl run-time anymore, it is already loaded and
> ready-to-run.
> - the perl add-on modules (the code), are also compiled (by perl) when
> Apache starts, and the "compiled" version is in memory, ready to run. Just
> like one of the standard C-based Apache modules like mod_mime, mod_rewrite
> etc..
> - however, the list of IP addresses is outside, in a file, and the perl
> module, at start, has an empty table.
> - the first time the module is called, it checks the table and sees that it
> is empty.  Then it reads the file, fills the table, and notes the timestamp
> of the file.  Then it handles the current request, to see if the IP matches
> or not, and rejects/approves the request.
> - the next time the module is called, it checks the table, and it is not
> empty. It then checks the timestamp of the file.  If it has changed, it
> reloads the table from the file, otherwise not.  Then it processes the
> current request. (If you want to not check the file at each request, but
> only every 30 seconds or every 10,000 requests, you can do that too.)
> You can do this kind of thing with mod_perl in this case, because you only
> read from the table (except when you totally reload it), and because it does
> not matter if several Apache "children" each have their own copy if the
> table.
>
> (In the above, I put "compile" between quotes, because perl compiles a
> script into "byte-code", which is later interpreted by the run-time portion
> of perl. But it is very fast, sometimes even faster than compiled C code.
>  And it is very much easier, and more fun, to write an Apache add-on module
> in perl, than in C. At least for me.)
>
>  Only option I think then is to read
>
>> the list from flat file. I just have one basic question about mod_perl.
>> Does
>> apache web server executes one process of perl per request ? Reason I am
>> asking is because you mentioned I could read the list from memory, and I
>> am
>> not sure how would it read from memory when this script will be executed
>> every time it tries to process the request. Because if I try to read from
>> file then every request will try to open the file and read from it. It
>> looks
>> like a stateless.
>>
>> Thanks for detailed explanation. It does clear lot of things and also is
>> giving me different view points. Include directive was a great tip that I
>> wasn't aware of.
>>
>> But it will not work in your case, because you would need to restart
> Apache, which will take a few seconds, during which there will be a huge
> number of unsatisfied HTTP requests piling up.
>
>
> Now, if you are really going to have 1,000's of requests/s on this server,
> I would be very interested in writing such a mod_perl module for you, and
> have you try it out on your server.  Just for the sake of seeing if it would
> work.  And if it does, I'll put it in my CV.
>
> André


Thanks. It doesn't look like you need to put it on your CV, people probably
know you by you name :).
Were you really serious ? Did you mean that the mod_perl module that you are
proposing will read the file or provide mechanism of reading the file only
once. Thanks a lot!!

Reply via email to