mod_substitute memory problem

2009-06-10 Thread Nick Gearls

Hello,

There seems to be a memory problem when substituing something on very 
long lines (several hundreds KB). this problem is different from bug 
44948 (I applied this patch).


When using something like Substitute s/string1/string2/ on a 300 KB 
line with 30 times string1 on the line, there are 2 problems:


1. It uses a huge amount of memory, as - I think - it reallocates a new 
buffer (300 KB) before each substitute without freeing the previous one


2. The memory is not freeed at the end of the HTTP request.
Maybe is it due to Keep-alive?

If using the q switch to not flatten the buckets, it uses almost no 
memory. Btw, it correctly handles more 60 substitutions on the same line 
(some shorter, some longer) without the q flag !?! When exactly is this 
flag needed?



In macro SEDSCAT, we have
s2 = apr_pstrmemdup(pool, buff, blen);
s1 = apr_pstrcat(pool, s1, s2, NULL);
Shouldn't we free the previous s1 buffer?

Anyway, I do not understand why the memory is not released, as the pool 
is supposed to be destroyed.


Regards,

Nick


Re: error: 'OPT_INCNOEXEC' undeclared

2009-06-10 Thread Oden Eriksson
 On Tue, Jun 9, 2009 at 10:55 AM, Oden Eriksson
 oden.eriks...@envitory.sewrote:

 Hello.

 The CVE-2009-1195 fix broke the mod_perl build:

 modperl_config.c:525: error: 'OPT_INCNOEXEC' undeclared (first use in
 this
 function)

 I saw http://svn.apache.org/viewvc?view=revrevision=779472 addresses
 this
 problem. Will this be the final official fix?


 2.2.x:
 I hope another tiny patch gets included too (
 http://people.apache.org/~trawick/mod_perl_more_compat.txt), but certainly
 r779472  will be in the next 2.2.x release to resolve that build failure.

 trunk:
 OPT_INCNOEXEC no longer exists.  Code which needs to know whether or not
 SSI
 exec is enabled will have to be modified (new and old: some flavor of SSI
 is
 enabled if OPT_INCLUDES is on; new: exec is enabled if OPT_INC_WITH_EXEC
 is
 on; old: exec is enabled unless OPT_INCNOEXEC is on).



OK. Thanks Jeff.

I will then have to ship a bugfix update for the Mandriva products. I
guess this also applies to Redhat and others as well.




This email has been processed by SmoothZap - www.smoothwall.net



Re: mod_substitute memory problem

2009-06-10 Thread Dan Poirier
Nick Gearls nickgea...@gmail.com writes:

 There seems to be a memory problem when substituing something on very
 long lines (several hundreds KB). this problem is different from bug
 44948 (I applied this patch).

 When using something like Substitute s/string1/string2/ on a 300 KB
 line with 30 times string1 on the line, there are 2 problems:

 1. It uses a huge amount of memory, as - I think - it reallocates a
 new buffer (300 KB) before each substitute without freeing the
 previous one

I think you're right - most of the working memory is not released until
the request is over.  This is kind of a worst-case for mod_substitute's
temporary memory usage.

Is this a problem in real cases?  Or just in artificial testing?
If these are real requests, maybe if you tell a little more about them,
we can find a better solution.

 2. The memory is not freeed at the end of the HTTP request.
 Maybe is it due to Keep-alive?

If by the memory is not freed, you mean the memory usage for the
http process as reported by the OS doesn't go down, that's right.
But at the end of the request, the memory should be released back
to Apache to re-use.  You can test this by repeating your test several
times - after the first time, the process's memory usage should not keep
going up the way it does the first time.

If that's a problem in your case, you can set MaxMemFree to ask Apache
to release memory by calling free() above a certain threshold.  Even
then, whether the process releases the memory back to the OS depends on
the OS, so it might not gain you anything.

But if this is going to happen very often, you might as well let Apache
hold on to the memory; it's just going to grab it back again the next
time this happens.

 If using the q switch to not flatten the buckets, it uses almost no
 memory. Btw, it correctly handles more 60 substitutions on the same
 line (some shorter, some longer) without the q flag !?! When exactly
 is this flag needed?

Don't know about this one.

 In macro SEDSCAT, we have
   s2 = apr_pstrmemdup(pool, buff, blen);
 s1 = apr_pstrcat(pool, s1, s2, NULL);
 Shouldn't we free the previous s1 buffer?

Pools don't provide a way of freeing individual allocations; you have
to destroy the whole pool.

 Anyway, I do not understand why the memory is not released, as the
 pool is supposed to be destroyed.

See above.

-- 
Dan Poirier poir...@pobox.com



Re: rotatelogs - Adding timeout for reading from stdin

2009-06-10 Thread Mladen Turk

William A. Rowe, Jr. wrote:

Mladen Turk wrote:

I'd leave rotatelogs alone, please.  Its design is not flawed.  But
I think Jim was working on something which would release the file
just as soon as its time is up, and that is useful across all of
the architectures, not windows specific.



Attached is the patch that uses 1 second timeout when reading
from stdin (hard-coded because we have 1 second rotation resolution).

Unfortunately I had to make WIN32 specific peace of code,
cause we don't have APR_FILES_AS_SOCKETS on that platform.

Code was testes both on windows and linux and it works like
a charm (rotation is done at exact time regardless of log events).


Regards
--
^(TM)
--- rotatelogs.c.org	2008-11-06 07:49:07.0 +0100
+++ rotatelogs.c	2009-06-10 14:15:39.0 +0200
@@ -44,6 +44,8 @@
 #include apr_file_io.h
 #include apr_file_info.h
 #include apr_general.h
+#include apr_poll.h
+#include apr_portable.h
 #include apr_time.h
 #include apr_getopt.h
 
@@ -64,6 +66,53 @@
 #define MAX_PATH1024
 #endif
 
+#if APR_FILES_AS_SOCKETS
+static apr_pollfd_t wait_pollfd;
+static apr_status_t wait_for_io_or_timeout(apr_file_t *f,
+   apr_interval_time_t timeout,
+   apr_pool_t *pool)
+{
+apr_int32_t  nd;
+
+if (!wait_pollfd.p) {
+wait_pollfd.p = pool;
+}
+wait_pollfd.desc_type = APR_POLL_FILE;
+wait_pollfd.reqevents = APR_POLLIN;
+wait_pollfd.desc.f= f;
+
+return apr_poll(wait_pollfd, 1, nd, timeout);
+}
+#elif defined(WIN32)
+static apr_status_t wait_for_io_or_timeout(apr_file_t *f,
+   apr_interval_time_t timeout,
+   apr_pool_t *pool)
+{
+HANDLE h;
+char   c;
+DWORD  r;
+ints = 0;
+
+apr_os_file_get(h, f);
+while (PeekNamedPipe(h, c, 1, r, NULL, NULL)) {
+if (r == 1)
+break;
+if (s = apr_time_msec(timeout))
+return APR_TIMEUP;
+Sleep(100);
+s += 100;
+}
+return APR_SUCCESS;
+}
+#else
+static apr_status_t wait_for_io_or_timeout(apr_file_t *f,
+   apr_interval_time_t timeout,
+   apr_pool_t *pool)
+{
+return APR_SUCCESS;
+}
+#endif /* APR_FILES_AS_SOCKETS */
+
 static void usage(const char *argv0, const char *reason)
 {
 if (reason) {
@@ -198,7 +247,18 @@
  * since we reset bypass_io after the 1st loop
  */
 if (!bypass_io) {
-if (apr_file_read(f_stdin, buf, nRead) != APR_SUCCESS) {
+apr_status_t rv;
+
+rv = wait_for_io_or_timeout(f_stdin, apr_time_from_sec(1), pool);
+if (rv == APR_SUCCESS) {
+if (apr_file_read(f_stdin, buf, nRead) != APR_SUCCESS) {
+exit(3);
+}
+}
+else if (APR_STATUS_IS_TIMEUP(rv)) {
+nRead = 0;
+}
+else {
 exit(3);
 }
 }


Re: rotatelogs - Adding timeout for reading from stdin

2009-06-10 Thread Dan Poirier
Would wait_for_io_or_timeout() be a good candidate for apr?

-- 
Dan Poirier poir...@pobox.com


Re: mod_substitute memory problem

2009-06-10 Thread Nick Kew
On Wed, 10 Jun 2009 13:11:16 +0200
Nick Gearls nickgea...@gmail.com wrote:

 2. The memory is not freeed at the end of the HTTP request.
 Maybe is it due to Keep-alive?

It's recycled within the server.

 If using the q switch to not flatten the buckets, it uses almost no 
 memory. Btw, it correctly handles more 60 substitutions on the same
 line (some shorter, some longer) without the q flag !?! When exactly
 is this flag needed?

Rarely, IMO.  When I wrote mod_line_edit - which does essentially the
same as mod_substitute - I didn't provide that option, and instead
documented it as not working with overlapping substitutions.

Another slow by default flag in mod_substitute is that it
uses regexps by default.  Turn them off unless you definitely
need them!

 Anyway, I do not understand why the memory is not released, as the
 pool is supposed to be destroyed.

It's recycled within the server.

mod_sed is now the state-of-the-art.  Perhaps a visit to
http://httpd.apache.org/docs/2.3/mod/mod_sed.html
would be in order.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/


Re: rotatelogs - Adding timeout for reading from stdin

2009-06-10 Thread Plüm, Rüdiger, VF-Group
Any particular reason why not using apr_wait_for_io_or_timeout
at least on unix?

Regards

Rüdiger 

 -Ursprüngliche Nachricht-
 Von: Mladen Turk 
 Gesendet: Mittwoch, 10. Juni 2009 14:24
 An: dev@httpd.apache.org
 Betreff: Re: rotatelogs - Adding timeout for reading from stdin
 
 William A. Rowe, Jr. wrote:
  Mladen Turk wrote:
  
  I'd leave rotatelogs alone, please.  Its design is not flawed.  But
  I think Jim was working on something which would release the file
  just as soon as its time is up, and that is useful across all of
  the architectures, not windows specific.
  
 
 Attached is the patch that uses 1 second timeout when reading
 from stdin (hard-coded because we have 1 second rotation resolution).
 
 Unfortunately I had to make WIN32 specific peace of code,
 cause we don't have APR_FILES_AS_SOCKETS on that platform.
 
 Code was testes both on windows and linux and it works like
 a charm (rotation is done at exact time regardless of log events).
 
 
 Regards
 -- 
 ^(TM)
 


Re: Some ramblings on httpd config

2009-06-10 Thread Rich Bowen


On Jun 9, 2009, at 08:49, Akins, Brian wrote:


Some pseudo request processing code to do same thing:
if listening_port == 80 then
if r.hostname == 'www.foo.com' then

elseif r.hostname =~ /www\d.bar.[org|net]/
end
end



As long as we're talking about exposing nifty syntax in the config ...

I'd be content (at least briefly - I'm sure I could find something  
else to be discontent about) if virtual hosts were less baffling, or,  
at least, harder to get wrong. Something like:


Interface 10.0.0.1:80
  Websites example.com www.example.com foo[123].example.org
  DocumentRoot /x/y
  Websites

  Websites __dynamic__ 
  DocumentRoot /var/vhosts/HOSTNAME
  CustomLog /var/log/HOSTNAME.log combined
  /Websites
/Interface

And perhaps something fancier regexey to encompass all of the magic  
that mod_vhost_alias does.


Or something like that that's sufficiently like what we have now to be  
easy to learn, but sufficiently different to be harder to get wrong,  
and a whole heck of a lot easier to do dynamic vhosts.


One of the other big complaints that comes up, in various ways, daily  
on IRC, is that there's no standard way to put variables in  
configuration directives. If we had a standard set of variables (like,  
say, HOSTNAME or whatever) that could be interpolated into config  
directives, as well as a standard way to do variable assignment and  
backreferences, that would solve a whole class of problems that are  
really pretty difficult right now.


Of course, I say all of that being utterly ignorant of how this all  
works internally, so feel free to ignore me. Like I said before, I  
answer the how do I questions every day, and frequently find myself  
wondering why things that *sound* so easy are actually so very  
difficult.



--
A poet more than thirty years old is simply an overgrown child.
H. L. Mencken



Re: rotatelogs - Adding timeout for reading from stdin

2009-06-10 Thread Mladen Turk

Dan Poirier wrote:

Would wait_for_io_or_timeout() be a good candidate for apr?



There is apr_wait_for_io_or_timeout but it uses the
apr_file_t-timeout which can be set only for pipes
and sockets.

Also the Win32 code will work only for stdhandles because
they are pipes.

In essence the answer is no.
However the full async file API is different thing...

Regards
--
^(TM)


Re: rotatelogs - Adding timeout for reading from stdin

2009-06-10 Thread Mladen Turk

Plüm, Rüdiger, VF-Group wrote:

Any particular reason why not using apr_wait_for_io_or_timeout
at least on unix?



See the answer I gave to Dan.
It uses file internal timeout which we cannot set, so no
it cannot be used.


Regards
--
^(TM)


Re: Some ramblings on httpd config

2009-06-10 Thread Akins, Brian
On 6/10/09 8:52 AM, Rich Bowen rbo...@rcbowen.com wrote:

 Or something like that that's sufficiently like what we have now to be easy to
 learn, but sufficiently different to be harder to get wrong, and a whole heck
 of a lot easier to do dynamic vhosts.

I think Lua (or whatever) could include enough user friendly glue to make
something like that possible.


 If we had a standard set of variables (like, say, HOSTNAME or whatever) that
 could be interpolated into config directives, as well as a standard way to do
 variable assignment and backreferences, that would solve a whole class of
 problems that are really pretty difficult right now.

That sounds a whole lot like using the config as a runtime.  Hmmm, where
have I heard that before ;)



-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies



Re: Some ramblings on httpd config

2009-06-10 Thread Dan Poirier
Akins, Brian brian.ak...@turner.com writes:

 On 6/10/09 8:52 AM, Rich Bowen rbo...@rcbowen.com wrote:

 If we had a standard set of variables (like, say, HOSTNAME or whatever) that
 could be interpolated into config directives, as well as a standard way to do
 variable assignment and backreferences, that would solve a whole class of
 problems that are really pretty difficult right now.

 That sounds a whole lot like using the config as a runtime. 

I'm not sure I follow that.  I do like that the config would still look
a lot like it does now, only more flexible.

-- 
Dan Poirier poir...@pobox.com



Re: Some ramblings on httpd config

2009-06-10 Thread Akins, Brian
On 6/10/09 10:21 AM, Dan Poirier poir...@pobox.com wrote:

 That sounds a whole lot like using the config as a runtime.
 
 I'm not sure I follow that.  I do like that the config would still look
 a lot like it does now, only more flexible.


Even if we kept the same general syntax as now, if we introduce variables
and if/else, then the config has to be compiled and ran at request
time. (Glossing over a lot, I know.)  Also, module would have to have some
interface to accept the variables and conditionals. Also, we would have to
basically write our own domain-specific language.  My thought is rather than
incrementally attempting to do this, we can just adopt an existing language
that fits our needs - like lua.  This adoption can be done incrementally, as
well.  Some examples of that, using a more complete version of mod_lua, have
been mentioned.

-- 
Brian Akins




Re: mod_substitute memory problem

2009-06-10 Thread Nick Gearls

The problem is real.
I have an application which generates one line of about 300 KB :-(
In this case, Apache eats up all the memory (about 1 GB), and never 
gives it back to the OS - game over - reboot !


How could this be solved?

Thanks,

Nick


Dan Poirier wrote:

Nick Gearls nickgea...@gmail.com writes:


There seems to be a memory problem when substituing something on very
long lines (several hundreds KB). this problem is different from bug
44948 (I applied this patch).

When using something like Substitute s/string1/string2/ on a 300 KB
line with 30 times string1 on the line, there are 2 problems:

1. It uses a huge amount of memory, as - I think - it reallocates a
new buffer (300 KB) before each substitute without freeing the
previous one


I think you're right - most of the working memory is not released until
the request is over.  This is kind of a worst-case for mod_substitute's
temporary memory usage.

Is this a problem in real cases?  Or just in artificial testing?
If these are real requests, maybe if you tell a little more about them,
we can find a better solution.


2. The memory is not freeed at the end of the HTTP request.
Maybe is it due to Keep-alive?


If by the memory is not freed, you mean the memory usage for the
http process as reported by the OS doesn't go down, that's right.
But at the end of the request, the memory should be released back
to Apache to re-use.  You can test this by repeating your test several
times - after the first time, the process's memory usage should not keep
going up the way it does the first time.

If that's a problem in your case, you can set MaxMemFree to ask Apache
to release memory by calling free() above a certain threshold.  Even
then, whether the process releases the memory back to the OS depends on
the OS, so it might not gain you anything.

But if this is going to happen very often, you might as well let Apache
hold on to the memory; it's just going to grab it back again the next
time this happens.


If using the q switch to not flatten the buckets, it uses almost no
memory. Btw, it correctly handles more 60 substitutions on the same
line (some shorter, some longer) without the q flag !?! When exactly
is this flag needed?


Don't know about this one.


In macro SEDSCAT, we have
s2 = apr_pstrmemdup(pool, buff, blen);
s1 = apr_pstrcat(pool, s1, s2, NULL);
Shouldn't we free the previous s1 buffer?


Pools don't provide a way of freeing individual allocations; you have
to destroy the whole pool.


Anyway, I do not understand why the memory is not released, as the
pool is supposed to be destroyed.


See above.



Re: mod_substitute memory problem

2009-06-10 Thread Nick Gearls

Hi Nick,

Do you mean that mod_sed should be used instead of mod_substitute?
Because it is more complete, or more mature? I only need the substitution.

About the flattening: is the q flag only needed if you want to use two 
subst on the same line with overlapping between them?


Thanks,

Nick


Nick Kew wrote:

On Wed, 10 Jun 2009 13:11:16 +0200
Nick Gearls nickgea...@gmail.com wrote:


2. The memory is not freeed at the end of the HTTP request.
Maybe is it due to Keep-alive?


It's recycled within the server.

If using the q switch to not flatten the buckets, it uses almost no 
memory. Btw, it correctly handles more 60 substitutions on the same

line (some shorter, some longer) without the q flag !?! When exactly
is this flag needed?


Rarely, IMO.  When I wrote mod_line_edit - which does essentially the
same as mod_substitute - I didn't provide that option, and instead
documented it as not working with overlapping substitutions.

Another slow by default flag in mod_substitute is that it
uses regexps by default.  Turn them off unless you definitely
need them!


Anyway, I do not understand why the memory is not released, as the
pool is supposed to be destroyed.


It's recycled within the server.

mod_sed is now the state-of-the-art.  Perhaps a visit to
http://httpd.apache.org/docs/2.3/mod/mod_sed.html
would be in order.



Re: mod_substitute memory problem

2009-06-10 Thread William A. Rowe, Jr.
Nick Gearls wrote:
 
 If using the q switch to not flatten the buckets, it uses almost no
 memory. Btw, it correctly handles more 60 substitutions on the same line
 (some shorter, some longer) without the q flag !?! When exactly is this
 flag needed?

The q switch causes the pattern to fail every time the match pattern
spans two filter input buffers.  So it's very hit-or-miss and you have
no assurance that it can process this all.

Since the filter is built for text, and text is generally arranged with
lines short than 300KB, I wouldn't spend a bunch of time on this.  Take
a look at httpd trunk's mod_sed and see if the real thing is kinder with
respect to memory utilization; it truly is a stream editor.


Re: rotatelogs - Adding timeout for reading from stdin

2009-06-10 Thread William A. Rowe, Jr.
Mladen Turk wrote:
 
 Code was testes both on windows and linux and it works like
 a charm (rotation is done at exact time regardless of log events).

Foolish question, but rather than waking up every second, isn't it more
rational to compute the expiry of the current file and timeout for that
period of time?


Re: rotatelogs - Adding timeout for reading from stdin

2009-06-10 Thread William A. Rowe, Jr.
Mladen Turk wrote:
 Dan Poirier wrote:
 Would wait_for_io_or_timeout() be a good candidate for apr?

 
 There is apr_wait_for_io_or_timeout but it uses the
 apr_file_t-timeout which can be set only for pipes
 and sockets.
 
 Also the Win32 code will work only for stdhandles because
 they are pipes.

To clarify, it's no longer possible to pipe 'hugefile' through
rotatelogs with 1MB granularity.  This should at least be noted
in the docs if the patch is accepted.

Jim is traveling, it would be good to get his feedback Monday
on your proposed patch, because I'm near certain it's a puzzle
he had looked at.

FYI your patch didn't need platform specifics, if you used the
apr_poll API.




Re: mod_substitute memory problem

2009-06-10 Thread Ruediger Pluem


On 06/10/2009 06:18 PM, Nick Gearls wrote:
 Hi Nick,
 
 Do you mean that mod_sed should be used instead of mod_substitute?
 Because it is more complete, or more mature? I only need the substitution.
 
 About the flattening: is the q flag only needed if you want to use two
 subst on the same line with overlapping between them?

It's the other way round: If you have two subst with overlappings using the
q flag might result in missing substitutions. As this does not seem to be the
case for your problem it is safe (and good) to use it as it solves your memory
problem.

Regards

Rüdiger



Re: rotatelogs - Adding timeout for reading from stdin

2009-06-10 Thread Mladen Turk

William A. Rowe, Jr. wrote:

Mladen Turk wrote:

Code was testes both on windows and linux and it works like
a charm (rotation is done at exact time regardless of log events).


Foolish question, but rather than waking up every second, isn't it more
rational to compute the expiry of the current file and timeout for that
period of time?



Might be a nice add-on.
However it would require reorganizing the rotate logic to calculate
early instead late. This would complicate the patch and it
would be hard to follow the difference.
I've choose a straight forward approach.

Regards
--
^(TM)