mod_substitute memory problem
Hello, There seems to be a memory problem when substituing something on very long lines (several hundreds KB). this problem is different from bug 44948 (I applied this patch). When using something like Substitute s/string1/string2/ on a 300 KB line with 30 times string1 on the line, there are 2 problems: 1. It uses a huge amount of memory, as - I think - it reallocates a new buffer (300 KB) before each substitute without freeing the previous one 2. The memory is not freeed at the end of the HTTP request. Maybe is it due to Keep-alive? If using the q switch to not flatten the buckets, it uses almost no memory. Btw, it correctly handles more 60 substitutions on the same line (some shorter, some longer) without the q flag !?! When exactly is this flag needed? In macro SEDSCAT, we have s2 = apr_pstrmemdup(pool, buff, blen); s1 = apr_pstrcat(pool, s1, s2, NULL); Shouldn't we free the previous s1 buffer? Anyway, I do not understand why the memory is not released, as the pool is supposed to be destroyed. Regards, Nick
Re: error: 'OPT_INCNOEXEC' undeclared
On Tue, Jun 9, 2009 at 10:55 AM, Oden Eriksson oden.eriks...@envitory.sewrote: Hello. The CVE-2009-1195 fix broke the mod_perl build: modperl_config.c:525: error: 'OPT_INCNOEXEC' undeclared (first use in this function) I saw http://svn.apache.org/viewvc?view=revrevision=779472 addresses this problem. Will this be the final official fix? 2.2.x: I hope another tiny patch gets included too ( http://people.apache.org/~trawick/mod_perl_more_compat.txt), but certainly r779472 will be in the next 2.2.x release to resolve that build failure. trunk: OPT_INCNOEXEC no longer exists. Code which needs to know whether or not SSI exec is enabled will have to be modified (new and old: some flavor of SSI is enabled if OPT_INCLUDES is on; new: exec is enabled if OPT_INC_WITH_EXEC is on; old: exec is enabled unless OPT_INCNOEXEC is on). OK. Thanks Jeff. I will then have to ship a bugfix update for the Mandriva products. I guess this also applies to Redhat and others as well. This email has been processed by SmoothZap - www.smoothwall.net
Re: mod_substitute memory problem
Nick Gearls nickgea...@gmail.com writes: There seems to be a memory problem when substituing something on very long lines (several hundreds KB). this problem is different from bug 44948 (I applied this patch). When using something like Substitute s/string1/string2/ on a 300 KB line with 30 times string1 on the line, there are 2 problems: 1. It uses a huge amount of memory, as - I think - it reallocates a new buffer (300 KB) before each substitute without freeing the previous one I think you're right - most of the working memory is not released until the request is over. This is kind of a worst-case for mod_substitute's temporary memory usage. Is this a problem in real cases? Or just in artificial testing? If these are real requests, maybe if you tell a little more about them, we can find a better solution. 2. The memory is not freeed at the end of the HTTP request. Maybe is it due to Keep-alive? If by the memory is not freed, you mean the memory usage for the http process as reported by the OS doesn't go down, that's right. But at the end of the request, the memory should be released back to Apache to re-use. You can test this by repeating your test several times - after the first time, the process's memory usage should not keep going up the way it does the first time. If that's a problem in your case, you can set MaxMemFree to ask Apache to release memory by calling free() above a certain threshold. Even then, whether the process releases the memory back to the OS depends on the OS, so it might not gain you anything. But if this is going to happen very often, you might as well let Apache hold on to the memory; it's just going to grab it back again the next time this happens. If using the q switch to not flatten the buckets, it uses almost no memory. Btw, it correctly handles more 60 substitutions on the same line (some shorter, some longer) without the q flag !?! When exactly is this flag needed? Don't know about this one. In macro SEDSCAT, we have s2 = apr_pstrmemdup(pool, buff, blen); s1 = apr_pstrcat(pool, s1, s2, NULL); Shouldn't we free the previous s1 buffer? Pools don't provide a way of freeing individual allocations; you have to destroy the whole pool. Anyway, I do not understand why the memory is not released, as the pool is supposed to be destroyed. See above. -- Dan Poirier poir...@pobox.com
Re: rotatelogs - Adding timeout for reading from stdin
William A. Rowe, Jr. wrote: Mladen Turk wrote: I'd leave rotatelogs alone, please. Its design is not flawed. But I think Jim was working on something which would release the file just as soon as its time is up, and that is useful across all of the architectures, not windows specific. Attached is the patch that uses 1 second timeout when reading from stdin (hard-coded because we have 1 second rotation resolution). Unfortunately I had to make WIN32 specific peace of code, cause we don't have APR_FILES_AS_SOCKETS on that platform. Code was testes both on windows and linux and it works like a charm (rotation is done at exact time regardless of log events). Regards -- ^(TM) --- rotatelogs.c.org 2008-11-06 07:49:07.0 +0100 +++ rotatelogs.c 2009-06-10 14:15:39.0 +0200 @@ -44,6 +44,8 @@ #include apr_file_io.h #include apr_file_info.h #include apr_general.h +#include apr_poll.h +#include apr_portable.h #include apr_time.h #include apr_getopt.h @@ -64,6 +66,53 @@ #define MAX_PATH1024 #endif +#if APR_FILES_AS_SOCKETS +static apr_pollfd_t wait_pollfd; +static apr_status_t wait_for_io_or_timeout(apr_file_t *f, + apr_interval_time_t timeout, + apr_pool_t *pool) +{ +apr_int32_t nd; + +if (!wait_pollfd.p) { +wait_pollfd.p = pool; +} +wait_pollfd.desc_type = APR_POLL_FILE; +wait_pollfd.reqevents = APR_POLLIN; +wait_pollfd.desc.f= f; + +return apr_poll(wait_pollfd, 1, nd, timeout); +} +#elif defined(WIN32) +static apr_status_t wait_for_io_or_timeout(apr_file_t *f, + apr_interval_time_t timeout, + apr_pool_t *pool) +{ +HANDLE h; +char c; +DWORD r; +ints = 0; + +apr_os_file_get(h, f); +while (PeekNamedPipe(h, c, 1, r, NULL, NULL)) { +if (r == 1) +break; +if (s = apr_time_msec(timeout)) +return APR_TIMEUP; +Sleep(100); +s += 100; +} +return APR_SUCCESS; +} +#else +static apr_status_t wait_for_io_or_timeout(apr_file_t *f, + apr_interval_time_t timeout, + apr_pool_t *pool) +{ +return APR_SUCCESS; +} +#endif /* APR_FILES_AS_SOCKETS */ + static void usage(const char *argv0, const char *reason) { if (reason) { @@ -198,7 +247,18 @@ * since we reset bypass_io after the 1st loop */ if (!bypass_io) { -if (apr_file_read(f_stdin, buf, nRead) != APR_SUCCESS) { +apr_status_t rv; + +rv = wait_for_io_or_timeout(f_stdin, apr_time_from_sec(1), pool); +if (rv == APR_SUCCESS) { +if (apr_file_read(f_stdin, buf, nRead) != APR_SUCCESS) { +exit(3); +} +} +else if (APR_STATUS_IS_TIMEUP(rv)) { +nRead = 0; +} +else { exit(3); } }
Re: rotatelogs - Adding timeout for reading from stdin
Would wait_for_io_or_timeout() be a good candidate for apr? -- Dan Poirier poir...@pobox.com
Re: mod_substitute memory problem
On Wed, 10 Jun 2009 13:11:16 +0200 Nick Gearls nickgea...@gmail.com wrote: 2. The memory is not freeed at the end of the HTTP request. Maybe is it due to Keep-alive? It's recycled within the server. If using the q switch to not flatten the buckets, it uses almost no memory. Btw, it correctly handles more 60 substitutions on the same line (some shorter, some longer) without the q flag !?! When exactly is this flag needed? Rarely, IMO. When I wrote mod_line_edit - which does essentially the same as mod_substitute - I didn't provide that option, and instead documented it as not working with overlapping substitutions. Another slow by default flag in mod_substitute is that it uses regexps by default. Turn them off unless you definitely need them! Anyway, I do not understand why the memory is not released, as the pool is supposed to be destroyed. It's recycled within the server. mod_sed is now the state-of-the-art. Perhaps a visit to http://httpd.apache.org/docs/2.3/mod/mod_sed.html would be in order. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: rotatelogs - Adding timeout for reading from stdin
Any particular reason why not using apr_wait_for_io_or_timeout at least on unix? Regards Rüdiger -Ursprüngliche Nachricht- Von: Mladen Turk Gesendet: Mittwoch, 10. Juni 2009 14:24 An: dev@httpd.apache.org Betreff: Re: rotatelogs - Adding timeout for reading from stdin William A. Rowe, Jr. wrote: Mladen Turk wrote: I'd leave rotatelogs alone, please. Its design is not flawed. But I think Jim was working on something which would release the file just as soon as its time is up, and that is useful across all of the architectures, not windows specific. Attached is the patch that uses 1 second timeout when reading from stdin (hard-coded because we have 1 second rotation resolution). Unfortunately I had to make WIN32 specific peace of code, cause we don't have APR_FILES_AS_SOCKETS on that platform. Code was testes both on windows and linux and it works like a charm (rotation is done at exact time regardless of log events). Regards -- ^(TM)
Re: Some ramblings on httpd config
On Jun 9, 2009, at 08:49, Akins, Brian wrote: Some pseudo request processing code to do same thing: if listening_port == 80 then if r.hostname == 'www.foo.com' then elseif r.hostname =~ /www\d.bar.[org|net]/ end end As long as we're talking about exposing nifty syntax in the config ... I'd be content (at least briefly - I'm sure I could find something else to be discontent about) if virtual hosts were less baffling, or, at least, harder to get wrong. Something like: Interface 10.0.0.1:80 Websites example.com www.example.com foo[123].example.org DocumentRoot /x/y Websites Websites __dynamic__ DocumentRoot /var/vhosts/HOSTNAME CustomLog /var/log/HOSTNAME.log combined /Websites /Interface And perhaps something fancier regexey to encompass all of the magic that mod_vhost_alias does. Or something like that that's sufficiently like what we have now to be easy to learn, but sufficiently different to be harder to get wrong, and a whole heck of a lot easier to do dynamic vhosts. One of the other big complaints that comes up, in various ways, daily on IRC, is that there's no standard way to put variables in configuration directives. If we had a standard set of variables (like, say, HOSTNAME or whatever) that could be interpolated into config directives, as well as a standard way to do variable assignment and backreferences, that would solve a whole class of problems that are really pretty difficult right now. Of course, I say all of that being utterly ignorant of how this all works internally, so feel free to ignore me. Like I said before, I answer the how do I questions every day, and frequently find myself wondering why things that *sound* so easy are actually so very difficult. -- A poet more than thirty years old is simply an overgrown child. H. L. Mencken
Re: rotatelogs - Adding timeout for reading from stdin
Dan Poirier wrote: Would wait_for_io_or_timeout() be a good candidate for apr? There is apr_wait_for_io_or_timeout but it uses the apr_file_t-timeout which can be set only for pipes and sockets. Also the Win32 code will work only for stdhandles because they are pipes. In essence the answer is no. However the full async file API is different thing... Regards -- ^(TM)
Re: rotatelogs - Adding timeout for reading from stdin
Plüm, Rüdiger, VF-Group wrote: Any particular reason why not using apr_wait_for_io_or_timeout at least on unix? See the answer I gave to Dan. It uses file internal timeout which we cannot set, so no it cannot be used. Regards -- ^(TM)
Re: Some ramblings on httpd config
On 6/10/09 8:52 AM, Rich Bowen rbo...@rcbowen.com wrote: Or something like that that's sufficiently like what we have now to be easy to learn, but sufficiently different to be harder to get wrong, and a whole heck of a lot easier to do dynamic vhosts. I think Lua (or whatever) could include enough user friendly glue to make something like that possible. If we had a standard set of variables (like, say, HOSTNAME or whatever) that could be interpolated into config directives, as well as a standard way to do variable assignment and backreferences, that would solve a whole class of problems that are really pretty difficult right now. That sounds a whole lot like using the config as a runtime. Hmmm, where have I heard that before ;) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Some ramblings on httpd config
Akins, Brian brian.ak...@turner.com writes: On 6/10/09 8:52 AM, Rich Bowen rbo...@rcbowen.com wrote: If we had a standard set of variables (like, say, HOSTNAME or whatever) that could be interpolated into config directives, as well as a standard way to do variable assignment and backreferences, that would solve a whole class of problems that are really pretty difficult right now. That sounds a whole lot like using the config as a runtime. I'm not sure I follow that. I do like that the config would still look a lot like it does now, only more flexible. -- Dan Poirier poir...@pobox.com
Re: Some ramblings on httpd config
On 6/10/09 10:21 AM, Dan Poirier poir...@pobox.com wrote: That sounds a whole lot like using the config as a runtime. I'm not sure I follow that. I do like that the config would still look a lot like it does now, only more flexible. Even if we kept the same general syntax as now, if we introduce variables and if/else, then the config has to be compiled and ran at request time. (Glossing over a lot, I know.) Also, module would have to have some interface to accept the variables and conditionals. Also, we would have to basically write our own domain-specific language. My thought is rather than incrementally attempting to do this, we can just adopt an existing language that fits our needs - like lua. This adoption can be done incrementally, as well. Some examples of that, using a more complete version of mod_lua, have been mentioned. -- Brian Akins
Re: mod_substitute memory problem
The problem is real. I have an application which generates one line of about 300 KB :-( In this case, Apache eats up all the memory (about 1 GB), and never gives it back to the OS - game over - reboot ! How could this be solved? Thanks, Nick Dan Poirier wrote: Nick Gearls nickgea...@gmail.com writes: There seems to be a memory problem when substituing something on very long lines (several hundreds KB). this problem is different from bug 44948 (I applied this patch). When using something like Substitute s/string1/string2/ on a 300 KB line with 30 times string1 on the line, there are 2 problems: 1. It uses a huge amount of memory, as - I think - it reallocates a new buffer (300 KB) before each substitute without freeing the previous one I think you're right - most of the working memory is not released until the request is over. This is kind of a worst-case for mod_substitute's temporary memory usage. Is this a problem in real cases? Or just in artificial testing? If these are real requests, maybe if you tell a little more about them, we can find a better solution. 2. The memory is not freeed at the end of the HTTP request. Maybe is it due to Keep-alive? If by the memory is not freed, you mean the memory usage for the http process as reported by the OS doesn't go down, that's right. But at the end of the request, the memory should be released back to Apache to re-use. You can test this by repeating your test several times - after the first time, the process's memory usage should not keep going up the way it does the first time. If that's a problem in your case, you can set MaxMemFree to ask Apache to release memory by calling free() above a certain threshold. Even then, whether the process releases the memory back to the OS depends on the OS, so it might not gain you anything. But if this is going to happen very often, you might as well let Apache hold on to the memory; it's just going to grab it back again the next time this happens. If using the q switch to not flatten the buckets, it uses almost no memory. Btw, it correctly handles more 60 substitutions on the same line (some shorter, some longer) without the q flag !?! When exactly is this flag needed? Don't know about this one. In macro SEDSCAT, we have s2 = apr_pstrmemdup(pool, buff, blen); s1 = apr_pstrcat(pool, s1, s2, NULL); Shouldn't we free the previous s1 buffer? Pools don't provide a way of freeing individual allocations; you have to destroy the whole pool. Anyway, I do not understand why the memory is not released, as the pool is supposed to be destroyed. See above.
Re: mod_substitute memory problem
Hi Nick, Do you mean that mod_sed should be used instead of mod_substitute? Because it is more complete, or more mature? I only need the substitution. About the flattening: is the q flag only needed if you want to use two subst on the same line with overlapping between them? Thanks, Nick Nick Kew wrote: On Wed, 10 Jun 2009 13:11:16 +0200 Nick Gearls nickgea...@gmail.com wrote: 2. The memory is not freeed at the end of the HTTP request. Maybe is it due to Keep-alive? It's recycled within the server. If using the q switch to not flatten the buckets, it uses almost no memory. Btw, it correctly handles more 60 substitutions on the same line (some shorter, some longer) without the q flag !?! When exactly is this flag needed? Rarely, IMO. When I wrote mod_line_edit - which does essentially the same as mod_substitute - I didn't provide that option, and instead documented it as not working with overlapping substitutions. Another slow by default flag in mod_substitute is that it uses regexps by default. Turn them off unless you definitely need them! Anyway, I do not understand why the memory is not released, as the pool is supposed to be destroyed. It's recycled within the server. mod_sed is now the state-of-the-art. Perhaps a visit to http://httpd.apache.org/docs/2.3/mod/mod_sed.html would be in order.
Re: mod_substitute memory problem
Nick Gearls wrote: If using the q switch to not flatten the buckets, it uses almost no memory. Btw, it correctly handles more 60 substitutions on the same line (some shorter, some longer) without the q flag !?! When exactly is this flag needed? The q switch causes the pattern to fail every time the match pattern spans two filter input buffers. So it's very hit-or-miss and you have no assurance that it can process this all. Since the filter is built for text, and text is generally arranged with lines short than 300KB, I wouldn't spend a bunch of time on this. Take a look at httpd trunk's mod_sed and see if the real thing is kinder with respect to memory utilization; it truly is a stream editor.
Re: rotatelogs - Adding timeout for reading from stdin
Mladen Turk wrote: Code was testes both on windows and linux and it works like a charm (rotation is done at exact time regardless of log events). Foolish question, but rather than waking up every second, isn't it more rational to compute the expiry of the current file and timeout for that period of time?
Re: rotatelogs - Adding timeout for reading from stdin
Mladen Turk wrote: Dan Poirier wrote: Would wait_for_io_or_timeout() be a good candidate for apr? There is apr_wait_for_io_or_timeout but it uses the apr_file_t-timeout which can be set only for pipes and sockets. Also the Win32 code will work only for stdhandles because they are pipes. To clarify, it's no longer possible to pipe 'hugefile' through rotatelogs with 1MB granularity. This should at least be noted in the docs if the patch is accepted. Jim is traveling, it would be good to get his feedback Monday on your proposed patch, because I'm near certain it's a puzzle he had looked at. FYI your patch didn't need platform specifics, if you used the apr_poll API.
Re: mod_substitute memory problem
On 06/10/2009 06:18 PM, Nick Gearls wrote: Hi Nick, Do you mean that mod_sed should be used instead of mod_substitute? Because it is more complete, or more mature? I only need the substitution. About the flattening: is the q flag only needed if you want to use two subst on the same line with overlapping between them? It's the other way round: If you have two subst with overlappings using the q flag might result in missing substitutions. As this does not seem to be the case for your problem it is safe (and good) to use it as it solves your memory problem. Regards Rüdiger
Re: rotatelogs - Adding timeout for reading from stdin
William A. Rowe, Jr. wrote: Mladen Turk wrote: Code was testes both on windows and linux and it works like a charm (rotation is done at exact time regardless of log events). Foolish question, but rather than waking up every second, isn't it more rational to compute the expiry of the current file and timeout for that period of time? Might be a nice add-on. However it would require reorganizing the rotate logic to calculate early instead late. This would complicate the patch and it would be hard to follow the difference. I've choose a straight forward approach. Regards -- ^(TM)