[Bug 43449] Monitor effectiveness of HTCP purging

2013-08-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #23 from MZMcBride b...@mzmcbride.com ---
(In reply to comment #22)
 Change 77975 merged by BBlack:
 Add ganglia monitoring for vhtcpd.
 
 https://gerrit.wikimedia.org/r/77975

With this changeset now merged, I'm a little unclear what's still needed to
mark this bug as resolved/fixed. Brian W.: can you clarify?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-08-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Bawolff (Brian Wolff) bawolff...@gmail.com changed:

   What|Removed |Added

 Status|PATCH_TO_REVIEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-08-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #24 from Bawolff (Brian Wolff) bawolff...@gmail.com ---
(In reply to comment #23)
 (In reply to comment #22)
  Change 77975 merged by BBlack:
  Add ganglia monitoring for vhtcpd.
  
  https://gerrit.wikimedia.org/r/77975
 
 With this changeset now merged, I'm a little unclear what's still needed to
 mark this bug as resolved/fixed. Brian W.: can you clarify?

I'm going to call this bug closed for now. There's pretty graphs at
http://ganglia.wikimedia.org/latest/?r=hourcs=ce=m=vhtcpd_inpkts_dequeueds=by+namec=Upload+caches+eqiadh=host_regex=max_graphs=0tab=mvn=sh=1z=smallhc=4

One thing we were talking about earlier was doing actual tests where we have a
script that either looks at recent re-uploads, and check the purge succeded, or
specificly purged things, and checked to see if that works. We could do that
later if this monitoring turns out not to be enough

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-08-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #22 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 77975 merged by BBlack:
Add ganglia monitoring for vhtcpd.

https://gerrit.wikimedia.org/r/77975

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-08-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #20 from Bryan Davis bda...@wikimedia.org ---
There is a patch to setup ganglia monitoring for vhtcpd in gerrit 77975.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #13 from Rob Lanphier ro...@wikimedia.org ---
Brandon, are you actually building proper monitoring into this daemon, or do we
need to start separate work?  I remember Mark making the case that this could
be done within Varnish, but I'm still kinda confused as to how we can actually
do effective monitoring of Varnish purging from within Varnish.

Bug 49362 is an example of a bug that would be great to have proper monitoring
for.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #14 from Brandon Black bbl...@wikimedia.org ---
The daemon logs some stats to a file, which we could pick up and graph (but
currently do not, yet).  These would basically give you the rate of multicast
purge requests the daemon's receiving and whether it's failing to process any
of them due to some large-impact bug that's overflowing the queue.

The larger issue that makes that relatively ineffective is that the requests
arrive over multicast, which is an unreliable protocol by design.  They could
be lost in the sender's output buffers, anywhere in the network, or discarded
at the receiving cache (local buffering issues) and we'd have no indication
that was happening.

Upgrading from multicast is also an expensive proposition in terms of
complexity (after all, the reason we're using it is that it's simple and
efficient).  We've thrown around some ideas about replacing multicast with
http://en.wikipedia.org/wiki/Pragmatic_General_Multicast , likely using
http://zeromq.org/ as the communications abstraction layer, as a solution to
the unreliability of multicast.  This would basically give us a reliable
sequence-number system with retransmission that's handled at that layer.

That means adding zeromq support to the php that sends the purge requests,
adding it to vhtcpd, and most likely also building out a redundant,
co-operating set of middleboxes as publish/subscribe multiplexers.  I'm not
fond of going down this path unless we really see a strong need to upgrade from
multicast, though.  It smells of too much complexity for the problem we're
trying to solve, and/or that there may be a better mechanism for this if we
re-think how purging is being accomplished in general.

In any case, I think that would all be outside the scope of this ticket.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #15 from Bawolff (Brian Wolff) bawolff...@gmail.com ---
I think monitoring should happen outside of the deamon (Since as you say,
there's a limit to what we can do with an unreliable protocol).

What I would suggest is some script (perhaps even living on the tools lab) that
does the following:
*Find the 10 most recent overwritten files. Get the thumbnail sizes that would
be on the image description page, along with the original file asset (from both
europe and north america varnish). Look at the age header. If the age header is
longer than the time between last re-upload and now, yell.
*Pick a test file at random. Request the file at some random size. Do
?action=purge. Sleep for about 10 seconds. Request the file again. Check to
make sure that the age header is either not present or  10.
*For good measure. Pick a popular page like [[Wikipedia:Village pump
(Technical)]] (also some redirect page like [[WP:VPT]]). Request the page.
Check that the age header is less than the time between now and last edit. (Or
at least for the redirect case, make sure that the difference isn't super big
to give some lee-way for job queue)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #16 from Brandon Black bbl...@wikimedia.org ---
How would one Find the 10 most recent overwritten files reliably/efficiently?
 Most of these solutions you're suggesting seem to give us some probabilistic
idea that things are working, but really solve the problem if a random small
percentage of purges are being lost in the pipe somewhere.  They'd have to run
at pretty high rates to even catch singular failed elements (one varnish not
receiving purges, which may or may not have already cached the test file, which
you may or may not hit with your check)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #17 from Brandon Black bbl...@wikimedia.org ---
Sorry, I meant to say ..., but really DON'T solve the problem ...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #18 from Bawolff (Brian Wolff) bawolff...@gmail.com ---
(In reply to comment #16)
 How would one Find the 10 most recent overwritten files
 reliably/efficiently?
  Most of these solutions you're suggesting seem to give us some probabilistic
 idea that things are working, but really solve the problem if a random small
 percentage of purges are being lost in the pipe somewhere.  They'd have to
 run
 at pretty high rates to even catch singular failed elements (one varnish not
 receiving purges, which may or may not have already cached the test file,
 which
 you may or may not hit with your check)

Very true. However I'm more concerned with mass failures. (The type of thing
where doing this once every 6 hours would be sufficient). Massive failures to
the purging system have happend in the past several times. Monitoring for this
type of failure I think is important. (Fine grained monitoring would be cool
too, but seems more difficult)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #19 from Bawolff (Brian Wolff) bawolff...@gmail.com ---
How would one Find the 10 most recent overwritten files reliably/efficiently?

via db query (or api): select log_title from logging where log_type = 'upload'
and log_action = 'overwrite' order by log_timestamp DESC limit 10;

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-07-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #11 from Andre Klapper aklap...@wikimedia.org ---
Brandon: Were the last 5 weeks enough time to judge whether it's stable enough?
(Is this bug report fixed?)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-07-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #12 from Brandon Black bbl...@wikimedia.org ---
Yes, I think so, although I just fixed a bug in the software yesterday.  Still,
it's a significant improvement and we've un-deployed the previous software. 
May as well close this bug and then open further ones as warranted for further
changes to our purging architecture?  The title of the bug doesn't precisely
correlate with what ended up happening anyways (monitoring the success rate,
which we still can't really do, and won't ever be able to do with any real
accuracy so long as it's plain multicast).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-06-18 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #10 from Andre Klapper aklap...@wikimedia.org ---
(In reply to comment #9 by Brandon Black)
 The replacement daemon was deployed to production today.  The initial
 deployment is just a minimum-change swap of the two pieces of software. 
 Further enhancements (to performance, and logging of stats to spot multicast
 loss) will come once this has had a little time to stabilize without any loud
 complaints of being worse than before.

Great! Is it already possible to judge whether it's stable enough?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-05-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Ryan Kaldari rkald...@wikimedia.org changed:

   What|Removed |Added

   See Also||https://bugzilla.wikimedia.
   ||org/show_bug.cgi?id=48927

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-05-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #9 from Brandon Black bbl...@wikimedia.org ---
The replacement daemon was deployed to production today.  The initial
deployment is just a minimum-change swap of the two pieces of software. 
Further enhancements (to performance, and logging of stats to spot multicast
loss) will come once this has had a little time to stabilize without any loud
complaints of being worse than before.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-05-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #7 from Andre Klapper aklap...@wikimedia.org ---
Brandon: Are there any news / progress to share yet?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-05-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Brandon Black bbl...@wikimedia.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #8 from Brandon Black bbl...@wikimedia.org ---
Yes, I've been implementing a replacement for varnishhtcpd.  You can see the
evolving initial version at the changeset here:
https://gerrit.wikimedia.org/r/#/c/60390/ .  I hope to be able to test it in
prod in the next few days, and it shouldn't suffer from the perf/loss bugs of
the previous implementation.

Stats output still needs implementation there as well (for monitoring the
daemon's own reliability as well as other issues like loss of multicast
delivery), but we'd rather put the stats work in the fresh new code than attach
to the known-failing code.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-04-10 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Greg Grossmeier g...@wikimedia.org changed:

   What|Removed |Added

   Assignee|m...@nedworks.org   |bbl...@wikimedia.org

--- Comment #6 from Greg Grossmeier g...@wikimedia.org ---
Assigning to Brandon per Roadmap Updates meeting and email thread.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-03-20 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Rob Lanphier ro...@wikimedia.org changed:

   What|Removed |Added

   Priority|Normal  |High

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-02-28 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #5 from Andre Klapper aklap...@wikimedia.org ---
RT #4607

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-02-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Andre Klapper aklap...@wikimedia.org changed:

   What|Removed |Added

   Assignee|ct...@wikimedia.org |m...@nedworks.org

--- Comment #4 from Andre Klapper aklap...@wikimedia.org ---
Assigning to Mark as just discussed in the Ops/Platform meeting here in SF.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-01-28 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Rob Lanphier ro...@wikimedia.org changed:

   What|Removed |Added

   Assignee|wikibugs-l@lists.wikimedia. |ct...@wikimedia.org
   |org |

--- Comment #3 from Rob Lanphier ro...@wikimedia.org ---
I spoke with CT about this, and he's going to talk to his team about what can
be done here.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-01-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Richard Guk richardg...@yahoo.com changed:

   What|Removed |Added

 CC||richardg...@yahoo.com

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-01-23 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

--- Comment #2 from Andre Klapper aklap...@wikimedia.org ---
For the records, in case somebody considers working on this:
TimStarling andre__:  just purge a URL, request it, and check its Age header
TimStarling it should be less than some threshold
TimStarling http://tools.ietf.org/rfcmarkup?doc=2616#section-5.1.2

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-01-22 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Andre Klapper aklap...@wikimedia.org changed:

   What|Removed |Added

 CC||aklap...@wikimedia.org

--- Comment #1 from Andre Klapper aklap...@wikimedia.org ---
FYI, posted on ops@ by Tim Starling six hours ago:

There is a nagios check to make sure varnishhtcpd is working, but it
only checks to see if the process is still running, it doesn't check
to see if it is actually working.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-01-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

MZMcBride b...@mzmcbride.com changed:

   What|Removed |Added

 CC||b...@mzmcbride.com

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-01-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Nemo federicol...@tiscali.it changed:

   What|Removed |Added

   See Also||https://bugzilla.wikimedia.
   ||org/show_bug.cgi?id=41130

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2013-01-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Nemo federicol...@tiscali.it changed:

   What|Removed |Added

 CC||afeld...@wikimedia.org,
   ||federicol...@tiscali.it

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2012-12-31 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Andre Klapper aklap...@wikimedia.org changed:

   What|Removed |Added

   Priority|Unprioritized   |Normal
   Severity|normal  |enhancement

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 43449] Monitor effectiveness of HTCP purging

2012-12-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=43449

Marco maic...@yahoo.com changed:

   What|Removed |Added

 CC||maic...@yahoo.com

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l