Just rolled out a build against today's cvs (as a result of Matt's note) and no apparent EAGAIN issues on FreeBSD 4.11 - full XML stream is being returned, ~97K of data.
Thanks, Gil --- Ian Cunningham <[EMAIL PROTECTED]> wrote: > Martin, > > Now I have played around with delay code, this > version has an > exponential back off. The mode for the number of > loops was 1 loop which > implys your assumption is correct. The next highest > frequency of > occurrence for number of loops was 22, so waiting a > very short bit > either works, or doesn't and you have to wait a lot > longer. > > One idea we been throwing around is to idea is to > not call send() for > each metric, but instead, get all the data for one > host, and then send > it as one huge. We may run benchmarks on that later > and see whats better > in terms of wall clock time. > > Ian > > /* this function wraps calls to apr_send_socket to > handle EAGAIN */ > apr_status_t socket_send_full(apr_socket_t *sock, > const char *buf, > apr_size_t *len) > { > apr_status_t rv; > int loop = 0; > apr_size_t start_len; > apr_interval_time_t t; > > start_len = (*len); > (*len) = start_len; > rv = apr_socket_send( sock, buf, len); > > while (loop++ < 33 && APR_STATUS_IS_EAGAIN(rv)) > { > t = loop * loop * 100; > apr_sleep(t); > (*len) = start_len; > rv = apr_socket_send( sock, buf, len); > } > return rv; > } > > > Martin Knoblauch wrote: > > >Hi Ian, > > > > thanks for updation the patch. > > > > Puuhhh. That behaviour you describe is bad indeed. > Seems either Cygwin > >or M$ are doing something stupid. > > > > One thought - you are calling apr_socket_send() at > a high frequency in > >that loop. Have you played with inserting some > delay code in the loop? > >Maybe waiting a ms or so would increase the chance > of success? > > > >Cheers > >Martin > > > >--- Ian Cunningham <[EMAIL PROTECTED]> > wrote: > > > > > > > >>Martin, > >> > >>Non-scientific numbers here for you. Connecting to > the tcp port 600 > >>times, print_host_metric() called > apr_socket_send() at least 90,624 > >>times. Of those 90,624 times, we got stuck in a > EAGAIN while loop > >>1,190 > >>times. On average that while loop looped 29,116.66 > times, with > >>maximum > >>of 525,705 loops. > >> > >>Pretty bad in my opinion. But the workaround... > works :/ > >> > >>I have refactored all of the apr_socket_sends to > use the workaround. > >>I > >>have it error out if it loops more than 750,000 > times *shakes head*. > >>I've posted a patch to the bug that seems to work, > it only bombed out > >> > >>once in 600 tries. > >> > >> > >> > >> > >http://bugzilla.ganglia.info/cgi-bin/bugzilla/attachment.cgi?id=27&action=view > > > > > >>Ian > >> > >>Martin Knoblauch wrote: > >> > >> > >> > >>>Hi Richard, > >>> > >>>correct. I was waiting for a comment from Ian on > my concerns about > >>>possible endless loops before committing the > patch. > >>> > >>>Ian: what do you think. Do you have any data how > often you iterate > >>>those EAGAIN loops? > >>> > >>>Cheers > >>>Martin > >>> > >>> > >>>--- [EMAIL PROTECTED] wrote: > >>> > >>> > >>> > >>> > >>> > >>>>Gee, > >>>> > >>>>I thought that was fixed with this patch: > >>>>http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=50 > >>>> > >>>>Actually, looking at 3.0.3 gmond.c, it looks > like the patch did not > >>>>make > >>>>it > >>>>into the release - that's a shame. > >>>> > >>>>Even looking at the patch, it looks as if it is > a partial fix, > >>>>because > >>>>while > >>>>the patched metric printing is protected like > this (gmond.c, > >>>>process_tcp_accept_channel): > >>>><snip> > >>>> rv = print_host_metric(client, metric, > now); > >>>> while(rv == EAGAIN) > >>>> { > >>>> rv = print_host_metric(client, metric, > now); > >>>> } > >>>> if(rv != APR_SUCCESS) > >>>> { > >>>> goto close_accept_socket; > >>>> } > >>>> } > >>>></snip> > >>>> > >>>>the gmetric printing in the same function is not > protected: > >>>><snip> > >>>> > >>>> /* Send the gmetric info for this > particular host */ > >>>> for(metric_hi = > apr_hash_first(client_context, ((Ganglia_host > >>>>*)val)->gmetrics); > >>>> metric_hi; > >>>> metric_hi = apr_hash_next(metric_hi)) > >>>> { > >>>> void *metric; > >>>> apr_hash_this(metric_hi, NULL, NULL, > &metric); > >>>> > >>>> /* Print each of the metrics from > gmetric for this > >>>> > >>>> > >>host... > >> > >> > >>>>*/ > >>>> if(print_host_gmetric(client, metric, > now) != > >>>> > >>>> > >>APR_SUCCESS) > >> > >> > >>>> { > >>>> goto close_accept_socket; > >>>> } > >>>> } > >>>> > >>>>It may be best to talk to the original owner of > the patch, > >>>>I'm not confident to submit a patch myself, > although I will try > >>>>to submit a bugzill entry. > >>>> > >>>>kind regards, > >>>>Richard > >>>> > >>>>-----Original Message----- > >>>>From: > [EMAIL PROTECTED] > >>>>[mailto:[EMAIL PROTECTED] > On Behalf > >>>> > >>>> > >>Of > >> > >> > >>>>Gilad Raphaelli > >>>>Sent: 14 March 2006 18:35 > >>>>To: ganglia-developers@lists.sourceforge.net > >>>>Subject: [Ganglia-developers] RE: First > prerelease of ganglia-3.0.3 > >>>>ready for testing > >>>> > >>>> > >>>>I have tried the new release 3.0.3.200602231926 > >>>>without success on FreeBSD 4.11 - the xml is > still > >>>>truncated when attempting to access the data > from a > >>>>remote host. Interestingly, this is not the > case when > >>>>trying from the host running gmond. Based on > the > >>>>strace, my colleague commented: > >>>> > >>>> Default socket buffer is 64K. It appears that > >>>>socket is non-blocking. That last write is > failing > >>>>(EAGAIN) because the socket buffer is full. The > >>>>application is ignoring that fact and shutting > down > >>>>the socket. Looks to me like an application bug > that > >>>>just accidentally works on rhel. > >>>> > >>>> Please let me know if you need any more > information. > >>>> > >>>>Thank you, > >>>> > >>>>Gil > >>>>----------------------------------------------------- > >>>> > >>>>Running an strace on gmond (on the target host) > while > >>>>trying to retrieve the data shows: > >>>>71160 write(10, "<METRIC NAME=\"swap_free\" > >>>>VAL=\"41"..., 124) = 124 > >>>>71160 write(10, "<METRIC NAME=\"bytes_in\" > >>>>VAL=\"608"..., 129) = -1 EAGAIN > >>>>(Resource temporarily unavailable) > >>>>71160 shutdown(10, 0 /* receive */) = 0 > >>>> > >>>>What this looks like from the requester (not the > >>>>exact same transaction): > >>>> > >>>><METRIC NAME="mem_buffers" VAL="204096" > TYPE="uint32" UNITS="KB" > >>>>TN="119" TMAX="180" DMAX="0" SLOPE="both" > SOURCE="gmond"/> <METRIC > >>>>NAME="swap_free" VAL="4194136" TYPE="uint32" > UNITS="KB" TN="119" > >>>>TMAX="180" DMAX="0" SLOPE="both" > SOURCE="gmond"/> Connection > >>>> > >>>> > >>closed > >> > >> > >>>>by > >>>>foreign host. > >>>> > >>>>A normal transaction closes with a closing tag: > </GANGLIA_XML> > >>>> > >>>>__________________________________________________ > >>>>Do You Yahoo!? > >>>>Tired of spam? Yahoo! Mail has the best spam > protection around > >>>>http://mail.yahoo.com > >>>> > >>>> > >>>>------------------------------------------------------- > >>>>This SF.Net email is sponsored by xPML, a > groundbreaking scripting > >>>>language that extends applications into web and > mobile media. > >>>> > >>>> > >>Attend > >> > >> > >>>>the > >>>>live webcast and join the prime developer group > breaking into this > >>>>new > >>>>coding territory! > >>>> > >>>> > >>>> > >>>> > >>>> > >>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > >> > >> > >>> > >>> > >>> > >>> > >>>>_______________________________________________ > >>>>Ganglia-developers mailing list > >>>>Ganglia-developers@lists.sourceforge.net > >>>>https://lists.sourceforge.net/lists/listinfo/ganglia-developers > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>------------------------------------------------------------------------ > >> > >> > >>> > >>> > >>> > >>> > >>>>For more information about Barclays Capital, > please > >>>>visit our web site at http://www.barcap.com. > >>>> > >>>> > >>>>Internet communications are not secure and > therefore the Barclays > >>>>Group does not accept legal responsibility for > the contents of this > >>>> > >>>> > >>>>message. Although the Barclays Group operates > anti-virus > >>>> > >>>> > >>programmes, > >> > >> > >>>>it does not accept responsibility for any damage > whatsoever that is > >>>> > >>>> > >>>>caused by viruses being passed. Any views or > opinions presented > >>>> > >>>> > >>are > >> > >> > >>>>solely those of the author and do not > necessarily represent those > >>>> > >>>> > >>of > >> > >> > >>>>the > >>>>Barclays Group. Replies to this email may be > monitored by the > >>>>Barclays > >>>>Group for operational or business reasons. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>------------------------------------------------------------------------ > >> > >> > >>> > >>> > >>> > >>> > >>>>------------------------------------------------------- > >>>>This SF.Net email is sponsored by xPML, a > groundbreaking scripting > >>>>language > >>>>that extends applications into web and mobile > media. Attend the > >>>> > >>>> > >>live > >> > >> > >>>>webcast > >>>>and join the prime developer group breaking into > this new coding > >>>>territory! > >>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 > >>>>_______________________________________________ > >>>>Ganglia-developers mailing list > >>>>Ganglia-developers@lists.sourceforge.net > >>>>https://lists.sourceforge.net/lists/listinfo/ganglia-developers > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>------------------------------------------------------ > >>>Martin Knoblauch > >>>email: k n o b i AT knobisoft DOT de > >>>www: http://www.knobisoft.de > >>> > >>> > >>> > >>> > >>> > > > > > >------------------------------------------------------ > >Martin Knoblauch > >email: k n o b i AT knobisoft DOT de > >www: http://www.knobisoft.de > > > > > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com