On 6/6/08, André Warnier <[EMAIL PROTECTED]> wrote: > > > > Mohit Anchlia wrote: > >> On 6/6/08, André Warnier <[EMAIL PROTECTED]> wrote: >> >>> >>> >>> Mohit Anchlia wrote: >>> >>> On 6/5/08, André Warnier <[EMAIL PROTECTED]> wrote: >>>> >>>> >>>> Mohit Anchlia wrote: >>>>> >>>>> On 6/5/08, André Warnier <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> >>>>>> Mohit Anchlia wrote: >>>>>>> >>>>>>> On 6/5/08, André Warnier <[EMAIL PROTECTED]> wrote: >>>>>>> >>>>>>> Mohit Anchlia wrote: >>>>>>>> >>>>>>>>> On 6/4/08, Dragon <[EMAIL PROTECTED]> wrote: >>>>>>>>> >>>>>>>>> André Warnier wrote: >>>>>>>>> >>>>>>>>>> Mohit Anchlia wrote: >>>>>>>>>> >>>>>>>>>>> 2. Another question I had was sometimes we don't get real >>>>>>>>>>> physical >>>>>>>>>>> IP >>>>>>>>>>> >>>>>>>>>>> of >>>>>>>>>>> >>>>>>>>>>>> the >>>>>>>>>>>> >>>>>>>>>>>> machine but the IP of something that's in between like "router", >>>>>>>>>>>> >>>>>>>>>>>>> is >>>>>>>>>>>>> there >>>>>>>>>>>>> a >>>>>>>>>>>>> way to get the real IP so that we don't end up blocking people >>>>>>>>>>>>> coming >>>>>>>>>>>>> from >>>>>>>>>>>>> that "router" or "proxy" >>>>>>>>>>>>> >>>>>>>>>>>>> In my opinion, you cannot. The whole point of such routers and >>>>>>>>>>>>> proxies >>>>>>>>>>>>> >>>>>>>>>>>>> is >>>>>>>>>>>>> >>>>>>>>>>>>> to make the requests look like they are coming from the >>>>>>>>>>>> router/proxy, >>>>>>>>>>>> so >>>>>>>>>>>> that is the sender IP address you are seeing at your server >>>>>>>>>>>> level, >>>>>>>>>>>> and >>>>>>>>>>>> that's it. Your server never receives the original requester IP >>>>>>>>>>>> address. >>>>>>>>>>>> >>>>>>>>>>>> ---------------- End original message. --------------------- >>>>>>>>>>>> >>>>>>>>>>>> There are legitimate reasons for this to be done as well, >>>>>>>>>>>> >>>>>>>>>>>> indiscriminately >>>>>>>>>>> blocking such access is a bad idea as it will affect legitimate >>>>>>>>>>> users. >>>>>>>>>>> NAT >>>>>>>>>>> and IP address sharing are among the reasons. This allows an >>>>>>>>>>> organization >>>>>>>>>>> to >>>>>>>>>>> have a router with one public IP address to serve a larger >>>>>>>>>>> internal >>>>>>>>>>> network >>>>>>>>>>> with private IP addresses. Without this, we would have run out of >>>>>>>>>>> IPv4 >>>>>>>>>>> addresses a long time ago. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Dragon >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If there is no way to get the real IP address then how would >>>>>>>>>>> router >>>>>>>>>>> >>>>>>>>>>> know >>>>>>>>>> which machine to direct the response to. It got to have some >>>>>>>>>> information >>>>>>>>>> in >>>>>>>>>> the packet. For eg: If A send to router B and router sends to C >>>>>>>>>> then >>>>>>>>>> when >>>>>>>>>> C >>>>>>>>>> responds how would B know that the response is for A. >>>>>>>>>> >>>>>>>>>> You are perfectly right : the router knows the real IP address. >>>>>>>>>> But >>>>>>>>>> it >>>>>>>>>> >>>>>>>>>> will not tell you, haha. >>>>>>>>>> >>>>>>>>>> Seriously, this is how it works : >>>>>>>>> the original system sends out an "open session" packet, through the >>>>>>>>> router, >>>>>>>>> to the final destination. >>>>>>>>> The router sees this packet, and analyses it. It extracts the IP >>>>>>>>> address >>>>>>>>> and port of the original sender, and keeps it in a table. >>>>>>>>> Then it replaces the IP address by it's own, adds some port number, >>>>>>>>> and >>>>>>>>> also memorises this new port number in the same table entry. >>>>>>>>> Then it sends the modified packet to the external server (yours). >>>>>>>>> It knows that the server on the other side is going to respond to >>>>>>>>> this >>>>>>>>> same >>>>>>>>> IP address and port (the ones of the router). >>>>>>>>> When the return packet from the server comes back, the router looks >>>>>>>>> at >>>>>>>>> the >>>>>>>>> port in it, finds the corresponding entry in it's table, and now it >>>>>>>>> knows >>>>>>>>> to >>>>>>>>> whom it should send the packet internally. >>>>>>>>> And so on. >>>>>>>>> So : >>>>>>>>> - the router knows everything >>>>>>>>> - the internal system thinks it is talking directly to the external >>>>>>>>> server >>>>>>>>> - the external server (yours) only sees the router IP and port, so >>>>>>>>> it >>>>>>>>> thinks that is where the packet comes from. >>>>>>>>> >>>>>>>>> That's NAT for you, in a nutshell. >>>>>>>>> >>>>>>>>> Yes ? >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for the great explanation. But, I wonder how do people >>>>>>>>> design >>>>>>>>> >>>>>>>>> app >>>>>>>> agains Denial of Service attack. Say Computer A uses Cox/Times >>>>>>>> warner >>>>>>>> (cable) Internet connection and starts attacking B, then how would a >>>>>>>> system be configured in a way that not all the users using Times >>>>>>>> Warner/Cox >>>>>>>> are affected. Should it be granular enough to give IP and source >>>>>>>> Port >>>>>>>> in >>>>>>>> IP >>>>>>>> blocking rules ? >>>>>>>> >>>>>>>> >>>>>>>> I think that is quite a different case. Not all users of an ISP >>>>>>>> (like >>>>>>>> >>>>>>>> the >>>>>>> one you mention I suppose) are "behind" a NAT router that hides their >>>>>>> IP >>>>>>> address. Instead, these ISP's have a large pool of public IP >>>>>>> addresses >>>>>>> which they "own", and they attribute them dynamically to users when >>>>>>> they >>>>>>> connect (and put the address back in the pool when the user >>>>>>> disconnects). >>>>>>> >>>>>>> If a DOS attack came from a router with a fixed IP address, and >>>>>>> everyone >>>>>>> would know that this IP address belongs to company xyz, I'm sure that >>>>>>> it >>>>>>> would not be long before company xyz would be facing a big lawsuit. >>>>>>> >>>>>>> But in the case of an ISP, with tens of thousands of customers, each >>>>>>> one >>>>>>> of >>>>>>> which gets a different IP address each time he turns on his computer >>>>>>> (and >>>>>>> anyway once per 24 hours in general), finding out who exactly was " >>>>>>> a234d-45hjk-dialin-atlanta.cox-t-warner.net" between 17:45 and 17:53 >>>>>>> yesterday is a bit more time-consuming. >>>>>>> >>>>>>> But in that case anyway, you do have a real individual sender IP >>>>>>> address >>>>>>> when the packet reaches your server, so you can decide to block it. >>>>>>> And keep blocking all packets from this address for the next 24 >>>>>>> hours. >>>>>>> And that's exactly what many servers do. >>>>>>> And that is also why sometimes you may turn on your PC at home >>>>>>> (getting >>>>>>> a >>>>>>> brand-new IP address) and find out that you cannot connect to some >>>>>>> server >>>>>>> because it is rejecting your IP address. Chances are that you are >>>>>>> unlucky >>>>>>> enough to have received today the IP address that was used yesterday >>>>>>> by >>>>>>> someone else who used it to send out 1M emails. >>>>>>> >>>>>>> But isn't this getting a bit off-topic ? >>>>>>> If you want to know more about this, I suggest you Google a bit on >>>>>>> "blacklists", "greylists" and "whitelists" for example. >>>>>>> or start here : http://en.wikipedia.org/wiki/DNSBL >>>>>>> >>>>>>> >>>>>>> >>>>>> Thanks ..it did go off-track a little bit and but it helps me >>>>>> understand >>>>>> what I should expect when doing such a blocking. Thanks for your >>>>>> explanation. >>>>>> >>>>>> Now coming back on track, out of below 2 approaches which one is >>>>>> better: >>>>>> >>>>>> 1. Use "deny from IP" in <LocationMatch> >>>>>> 2. Use RewriteCond and call a perl script dynamically. This helps me >>>>>> configure IP dynamically without having to stop and start servers >>>>>> everytime >>>>>> I change httpd.conf >>>>>> >>>>>> Is there any performance impact of using 2 over 1 or any other issues. >>>>>> >>>>>> >>>>>> There will be a very big difference : in case (1), the IP addresses or >>>>>> >>>>> ranges are pre-processed by Apache at startup time, and the comparison >>>>> will >>>>> be made by an internal (and fast) Apache module, on the base of >>>>> information >>>>> in memory. In case (2), not only are you using a rewrite of the URI, >>>>> but >>>>> in >>>>> addition you will be executing a script, which itself is going to read >>>>> an >>>>> external file. That is going to be several hundred times slower, at >>>>> least. >>>>> Thousands of times slower if you recompile and execute the script with >>>>> perl >>>>> each time (if not under mod_perl). >>>>> Now wether it matters or not in your case, depends on the load of your >>>>> server. If it is doing nothing anyway 90% of the time, it doesn't >>>>> matter. >>>>> An Apache restart may or may not be such a big problem either, it all >>>>> depends on your circumstances. >>>>> >>>>> But rather than using a perl script, I would definitely in that case >>>>> use >>>>> a >>>>> mod_perl add-on module written as a PerlAccessHandler. But that's >>>>> another >>>>> story, and one more for the mod_perl list. >>>>> I would bet that there exists already such a mod_perl module by the >>>>> way. >>>>> Have a look here : >>>>> http://cpan.uwinnipeg.ca/search?query=apache2&mode=dist >>>>> or, there is probably an example in the Mod_perl Cookbook >>>>> >>>>> >>>> As per your suggestion I looked at PerlAccessHandler, how would this >>>> approach be in terms of performance as compared to have "deny from IP", >>>> is >>>> it still going to be really bad. >>>> <Location /URL> >>>> PerlAccessHandler Example::AccessHandler >>>> </Location> >>>> I will try running some test also. >>>> >>>> >>>> Well again, it all depends on your circumstances, what you want to >>> achieve, >>> how many accesses you expect, why exactly you want to block or allow some >>> IPs, how many different IP's or IP ranges you would want to allow/block, >>> how >>> often they change, in function of what they change, whether it is a big >>> problem or not for you to do an Apache restart, how loaded your system is >>> expected to be, etc.. >>> Even if one solution looks like it is 200 times slower than another, but >>> your server is only loaded at 10% (happens more frequently than you would >>> think), and it really makes your life easier for the next 3 years, it's >>> worth looking at. >>> And even if one solution is 200 times slower than another, that can still >>> mean 0,1 millisecond, so is it important for you ? >>> >>> A simple tip : >>> in the Apache configuration file, you can use an "include" directive, I >>> believe just about anywhere, to insert at that point another bit of >>> configuration file. >>> You could have a simple text file containing all your >>> Deny from 1.2.3.4 >>> Deny from 2.3.4.5 >>> ... >>> lines, and include it wherever you want. >>> Then a simple Apache restart would re-read it. >>> A this file could be written and re-written by some external script which >>> decides which IPs are allowed or not. Or edited with vi manually, if that >>> is >>> how often changes happen. >>> >>> If you have a PerlAccessHandler under mod_perl : >>> - perl itself is part of the server, so it does not have to be reloaded >>> each time >>> - the handler gets compiled once the first time it is run, and the >>> compiled >>> code is re-used afterward >>> - it can be smart, and only re-read the IP address list, and rebuild its >>> internal table when the file changes >>> - and in the meantime, it uses the table in memory >>> So in that case you would not have to restart Apache, and any changes >>> would >>> take effect immediately. >>> >>> Also, something else : >>> So far, you have been talking about blocking HTTP accesses at the Apache >>> level. But maybe you want to block more than port 80 from those IP >>> addresses, and maybe you should do this outside of Apache, before it even >>> gets to Apache ? >>> >>> There are many solutions, but you are the one to decide which one you >>> implement. >>> >> >> >> Thanks. You are right we should not even let these people get to Apache. >> We >> have that process in place, but it often takes time to get that request >> approved and processed by Network team. Meanwhile we want something that >> we >> can block on ASAP. I am not sure how often this list will change. To begin >> with this list is going to be empty. Only when we experience DOS then we >> will update the IP. >> >> We expect to get 1000s of requests per second. Since it's going to be >> highly >> loaded server I started to think about something that would change >> dynamically. You mentioned the code is compiled when apache restarts, >> which >> means that if I keep list of IPs as an array inside the perl script is not >> going to take affect until next restart. >> > > The following is a bit academic, because I believe that with this kind of > volume you will be better off with a solution outside of Apache anyway, but > for the sake of argument : > > That is not exactly what I meant. The list of IP's to block is in an > external file, which can change from time to time. > With mod_perl, > - the perl interpreter is "embedded" in Apache from the start. To say it > another way, you have an Apache with a built-in perl compiler and run-time. > That means that later, to run compiled perl code, Apache does not have to > start an instance of the perl run-time anymore, it is already loaded and > ready-to-run. > - the perl add-on modules (the code), are also compiled (by perl) when > Apache starts, and the "compiled" version is in memory, ready to run. Just > like one of the standard C-based Apache modules like mod_mime, mod_rewrite > etc.. > - however, the list of IP addresses is outside, in a file, and the perl > module, at start, has an empty table. > - the first time the module is called, it checks the table and sees that it > is empty. Then it reads the file, fills the table, and notes the timestamp > of the file. Then it handles the current request, to see if the IP matches > or not, and rejects/approves the request. > - the next time the module is called, it checks the table, and it is not > empty. It then checks the timestamp of the file. If it has changed, it > reloads the table from the file, otherwise not. Then it processes the > current request. (If you want to not check the file at each request, but > only every 30 seconds or every 10,000 requests, you can do that too.) > You can do this kind of thing with mod_perl in this case, because you only > read from the table (except when you totally reload it), and because it does > not matter if several Apache "children" each have their own copy if the > table. > > (In the above, I put "compile" between quotes, because perl compiles a > script into "byte-code", which is later interpreted by the run-time portion > of perl. But it is very fast, sometimes even faster than compiled C code. > And it is very much easier, and more fun, to write an Apache add-on module > in perl, than in C. At least for me.) > > Only option I think then is to read > >> the list from flat file. I just have one basic question about mod_perl. >> Does >> apache web server executes one process of perl per request ? Reason I am >> asking is because you mentioned I could read the list from memory, and I >> am >> not sure how would it read from memory when this script will be executed >> every time it tries to process the request. Because if I try to read from >> file then every request will try to open the file and read from it. It >> looks >> like a stateless. >> >> Thanks for detailed explanation. It does clear lot of things and also is >> giving me different view points. Include directive was a great tip that I >> wasn't aware of. >> >> But it will not work in your case, because you would need to restart > Apache, which will take a few seconds, during which there will be a huge > number of unsatisfied HTTP requests piling up. > > > Now, if you are really going to have 1,000's of requests/s on this server, > I would be very interested in writing such a mod_perl module for you, and > have you try it out on your server. Just for the sake of seeing if it would > work. And if it does, I'll put it in my CV. > > André
Thanks. It doesn't look like you need to put it on your CV, people probably know you by you name :). Were you really serious ? Did you mean that the mod_perl module that you are proposing will read the file or provide mechanism of reading the file only once. Thanks a lot!!