Update: I scaled my cluster down from 7 nodes to 3 nodes, and kept RF=3. I did a complete cluster rebuild, so everything was fresh. Kept my reads and writes at CL.ALL. For a while there it seemed like I had succeeded in eliminating the problem. Unfortunately about an hour ago a duplicate came through, and the same IPN was processed twice.
Does anyone have any more suggestions as to what is going on here? On Mon, Aug 22, 2011 at 1:59 PM, Kyle Gibson <kyle.gib...@frozenonline.com> wrote: > Thanks for the reply. > > On Mon, Aug 22, 2011 at 1:11 PM, Dominic Williams > <dwilli...@fightmymonster.com> wrote: >> Hi there, here's my tuppence! >> 1. Something to look at first: >> >> If you write two different values to the same column quickly in succession, >> if both writes go out with the same timestamp, then it is indeterminate >> which one wins i.e. write order doesn't necessarily matter. > > Understood. In the code example I provided, I am writing the same > value, but I am doing so in quick succession, so perhaps a few second > sleep might be helpful. It is worth noting also that the code I > provided is only the second step 2 in the process. There is a php > script that receives the post request from Paypal which inserts the > IPN data into the IPN column family. Before it does this, it sets the > "processed" column to "no" > >> 2. Don't trust PayPal (anyone using PayPal should really read this) >> We are / were relying on IPNs to manage our website's recurring >> subscriptions list. We experienced this weird thing where the >> recurring_payment_profile_created IPN was missing, and got thought maybe >> Cassandra was losing it because PayPal is a financial system and it couldn't >> possibly fail to generate an IPN, right!!? >> Anyway, it turns out that after exhaustive discussions with PayPal >> engineers, and having proved this from the PayPal logs, that sometimes IPNs >> fail to get generated. Yup. Read that again!!!! Sometimes the fail to get >> generated and in fact this is happening to us quite regularly now. >> They justify this (while acknowledging this issue should be in their >> documentation) by saying that because HTTP delivery is unreliable (hmmm >> isn't this what the retry queue is for..) we shouldn't be relying entirely >> on IPNs and should regularly download the logs and run them through scripts >> to catch problems (this is idiotic, since the angry customer will get on our >> case immediately when they pay and membership doesn't start) >> Not sure whether PayPal or database failing is best option. Look forward to >> hearing resolution. > > I have experienced a failing to receive an IPN event before. In this > case the IPN even is never saved to the IPN column family, and the > cron script doesn't process it once, or twice, for that matter. Odd > thing about the failed IPN event is that it didn't even show up in the > IPN history, so i couldn't "replay" the event. > > I am fairly positive that the problem is either with my environment or > cassandra and not paypal in this case. I am hoping it is my > environment because i suspect that will be easier to fix. > > Oddly enough, the second time the IPN is processed, the column write > succeeds. This always happens 5 minutes after the first one is > processed. > > I neglected to mention an important part of the process: after the IPN > event is processed (e.g. a new payment), an email is sent out to > myself and the sender. This is how I know for sure the event is being > processed twice, because not only do I receive two emails (spaced 5 > minutes apart) but does the individual who paid. This is often > embarrassing to explain and somewhat difficult, customers get confused > as to which account they are supposed to use, etc. > > Thanks > >> Best, >> Dominic >> On 22 August 2011 17:49, Kyle Gibson <kyle.gib...@frozenonline.com> wrote: >>> >>> I made some changes to my code base that uses cassandra. I went back >>> to using the "processed" column, but instead of using "0" or "1" I >>> decided to use "no" and "yes" >>> >>> You can view the code here: http://pastebin.com/gRBC16e7 >>> >>> As you can see from the code, I perform an insert, get, check the >>> result, if it didn't work, I try to insert again, and check the get. >>> Each time I do a print out to see what the result is. Each operation >>> is a CL.ALL. >>> >>> A few successful IPNs did come through before this one was generated: >>> >>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce >>> OrderedDict([..., (u'processed', u'no'), ...]) >>> Failed to set processed to yes >>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert 1314012603578714 >>> Failed to set processed to yes >>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert2 1314012603586201 >>> >>> As expected, this IPN was processed twice. >>> >>> On Sat, Aug 20, 2011 at 5:37 PM, Peter Schuller >>> <peter.schul...@infidyne.com> wrote: >>> >> Do you mean the cassandra log, or just logging in the script itself? >>> > >>> > The script itself. I.e, some "independent" verification that the line >>> > of code after the insert is in fact running, just in case there's some >>> > kind of silent failure. >>> > >>> > Sounds like you've tried to address it though with the E-Mail:s. >>> > >>> > I suppose it boils down to: Either there is something wrong in your >>> > environment/code, or Cassandra does have a bug. If the latter, it >>> > would probably be helpful if you could try to reproduce it in your >>> > environment in a way which can be shared - such as a script that does >>> > writes and reads back to confirm the write made it. Or maybe just >>> > adding more explicit logging to your script (even if it causes some >>> > log flooding) to "prove" that a write truly happened. >>> > >>> > -- >>> > / Peter Schuller (@scode on twitter) >>> > >> >> >