Re: Occasionally getting old data back with ConsistencyLevel.ALL

Kyle Gibson Fri, 26 Aug 2011 19:04:30 -0700

Update:

I scaled my cluster down from 7 nodes to 3 nodes, and kept RF=3. I did
a complete cluster rebuild, so everything was fresh. Kept my reads and
writes at CL.ALL. For a while there it seemed like I had succeeded in
eliminating the problem. Unfortunately about an hour ago a duplicate
came through, and the same IPN was processed twice.


Does anyone have any more suggestions as to what is going on here?

On Mon, Aug 22, 2011 at 1:59 PM, Kyle Gibson
<kyle.gib...@frozenonline.com> wrote:
> Thanks for the reply.
>
> On Mon, Aug 22, 2011 at 1:11 PM, Dominic Williams
> <dwilli...@fightmymonster.com> wrote:
>> Hi there, here's my tuppence!
>> 1. Something to look at first:
>>
>> If you write two different values to the same column quickly in succession,
>> if both writes go out with the same timestamp, then it is indeterminate
>> which one wins i.e. write order doesn't necessarily matter.
>
> Understood. In the code example I provided, I am writing the same
> value, but I am doing so in quick succession, so perhaps a few second
> sleep might be helpful. It is worth noting also that the code I
> provided is only the second step 2 in the process. There is a php
> script that receives the post request from Paypal which inserts the
> IPN data into the IPN column family. Before it does this, it sets the
> "processed" column to "no"
>
>> 2. Don't trust PayPal (anyone using PayPal should really read this)
>> We are / were relying on IPNs to manage our website's recurring
>> subscriptions list. We experienced this weird thing where the
>> recurring_payment_profile_created IPN was missing, and got thought maybe
>> Cassandra was losing it because PayPal is a financial system and it couldn't
>> possibly fail to generate an IPN, right!!?
>> Anyway, it turns out that after exhaustive discussions with PayPal
>> engineers, and having proved this from the PayPal logs, that sometimes IPNs
>> fail to get generated. Yup. Read that again!!!! Sometimes the fail to get
>> generated and in fact this is happening to us quite regularly now.
>> They justify this (while acknowledging this issue should be in their
>> documentation) by saying that because HTTP delivery is unreliable (hmmm
>> isn't this what the retry queue is for..) we shouldn't be relying entirely
>> on IPNs and should regularly download the logs and run them through scripts
>> to catch problems (this is idiotic, since the angry customer will get on our
>> case immediately when they pay and membership doesn't start)
>> Not sure whether PayPal or database failing is best option. Look forward to
>> hearing resolution.
>
> I have experienced a failing to receive an IPN event before. In this
> case the IPN even is never saved to the IPN column family, and the
> cron script doesn't process it once, or twice, for that matter. Odd
> thing about the failed IPN event is that it didn't even show up in the
> IPN history, so i couldn't "replay" the event.
>
> I am fairly positive that the problem is either with my environment or
> cassandra and not paypal in this case. I am hoping it is my
> environment because i suspect that will be easier to fix.
>
> Oddly enough, the second time the IPN is processed, the column write
> succeeds. This always happens 5 minutes after the first one is
> processed.
>
> I neglected to mention an important part of the process: after the IPN
> event is processed (e.g. a new payment), an email is sent out to
> myself and the sender. This is how I know for sure the event is being
> processed twice, because not only do I receive two emails (spaced 5
> minutes apart) but does the individual who paid. This is often
> embarrassing to explain and somewhat difficult, customers get confused
> as to which account they are supposed to use, etc.
>
> Thanks
>
>> Best,
>> Dominic
>> On 22 August 2011 17:49, Kyle Gibson <kyle.gib...@frozenonline.com> wrote:
>>>
>>> I made some changes to my code base that uses cassandra. I went back
>>> to using the "processed" column, but instead of using "0" or "1" I
>>> decided to use "no" and "yes"
>>>
>>> You can view the code here: http://pastebin.com/gRBC16e7
>>>
>>> As you can see from the code, I perform an insert, get, check the
>>> result, if it didn't work, I try to insert again, and check the get.
>>> Each time I do a print out to see what the result is. Each operation
>>> is a CL.ALL.
>>>
>>> A few successful IPNs did come through before this one was generated:
>>>
>>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce
>>> OrderedDict([..., (u'processed', u'no'), ...])
>>> Failed to set processed to yes
>>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert 1314012603578714
>>> Failed to set processed to yes
>>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert2 1314012603586201
>>>
>>> As expected, this IPN was processed twice.
>>>
>>> On Sat, Aug 20, 2011 at 5:37 PM, Peter Schuller
>>> <peter.schul...@infidyne.com> wrote:
>>> >> Do you mean the cassandra log, or just logging in the script itself?
>>> >
>>> > The script itself. I.e, some "independent" verification that the line
>>> > of code after the insert is in fact running, just in case there's some
>>> > kind of silent failure.
>>> >
>>> > Sounds like you've tried to address it though with the E-Mail:s.
>>> >
>>> > I suppose it boils down to: Either there is something wrong in your
>>> > environment/code, or Cassandra does have a bug. If the latter, it
>>> > would probably be helpful if you could try to reproduce it in your
>>> > environment in a way which can be shared - such as a script that does
>>> > writes and reads back to confirm the write made it. Or maybe just
>>> > adding more explicit logging to your script (even if it causes some
>>> > log flooding) to "prove" that a write truly happened.
>>> >
>>> > --
>>> > / Peter Schuller (@scode on twitter)
>>> >
>>
>>
>

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Reply via email to