Re: Consistency Issues

2015-05-13 Thread Jared Rodriguez
Thanks for the feedback. We have dug in deeper and upgraded to Cassandra 2.0.14 and are seeing the same issue. What appears to be happening is that if a record is initially written, then the first read is fine. But if we immediately update that record with a second write, that then the second re

Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
I'm running some ETL jobs, where the pattern is the following: 1- Get some records from an external API, 2- For each record, see if its lastModified date > the lastModified i have in db (or if I don't have that record in db) 3- If lastModified < dbLastModified, the item wasn't changed, ignore it

Insert Vs Updates - Both create tombstones

2015-05-13 Thread Walsh, Stephen
Quick Question, Our team is under much debate, we are trying to find out if an Update on a row with a TTL will create a tombstone. E.G We have one row with a TTL, if we keep "updating" that row before the TTL is hit, will a tombstone be created. I believe it will, but want to confirm. So if t

RE: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Peer, Oded
You can use the “last modified” value as the TIMESTAMP for your UPDATE operation. This way the values will only be updated if lastModified date > the lastModified you have in the DB. Updates to values don’t create tombstones. Only deletes (either by executing delete, inserting a null value or b

RE: Insert Vs Updates - Both create tombstones

2015-05-13 Thread Peer, Oded
Under the assumption that when you update the columns you also update the TTL for the columns then a tombstone won't be created for those columns. Remember that TTL is set on columns (or "cells"), not on rows, so your description of updating a row is slightly misleading. If every query updates d

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
Thanks Peter, that's interesting. I didn't know of that option. If updates don't create tombstones (and i'm already taking pains to ensure no nulls are present in queries), then is there no cost to just submitting an update for everything regardless of whether lastModified has changed? Thanks. O

Re: Insert Vs Updates - Both create tombstones

2015-05-13 Thread Ali Akhtar
If specifying 'using' timestamp, the docs say to provide microseconds, but where are these microseconds obtained from? I have regular java.util.Date objects, I can get the time in milliseconds (i.e the unix timestamp), how would I convert that to microseconds? On Wed, May 13, 2015 at 3:45 PM, Peer

Re: Insert Vs Updates - Both create tombstones

2015-05-13 Thread Ali Akhtar
Sorry, wrong thread. Disregard the above On Wed, May 13, 2015 at 4:08 PM, Ali Akhtar wrote: > If specifying 'using' timestamp, the docs say to provide microseconds, but > where are these microseconds obtained from? I have regular java.util.Date > objects, I can get the time in milliseconds (i.e

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
If specifying 'using' timestamp, the docs say to provide microseconds, but where are these microseconds obtained from? I have regular java.util.Date objects, I can get the time in milliseconds (i.e the unix timestamp), how would I convert that to microseconds? On Wed, May 13, 2015 at 3:56 PM, Ali

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
Would TimeUnit.MILLISECONDS.toMicros( myDate.getTime() ) work for producing the microsecond timestamp ? On Wed, May 13, 2015 at 4:09 PM, Ali Akhtar wrote: > If specifying 'using' timestamp, the docs say to provide microseconds, but > where are these microseconds obtained from? I have regular ja

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ken Hancock
While updates don't create tombstones, overwrites create a similar performance penalty at the read phase. That key will need to be fetched from every SSTable where it resides so the "most recent" column can be returned. On Wed, May 13, 2015 at 6:38 AM, Peer, Oded wrote: > You can use the “la

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
Thanks. So supplying the timestamp with the update (via using) should fix that, right? (By skipping updates where lastModified < dbLastModified). I'm currently doing TimeUnit.MILLISECONDS.toMicros( myDate.getTime() ) and that has worked for inserts, however how do I verify that future updates are

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
Is there a way in the java driver, to get the number of rows that an update was applied to? On Wed, May 13, 2015 at 4:33 PM, Ali Akhtar wrote: > Thanks. So supplying the timestamp with the update (via using) should fix > that, right? (By skipping updates where lastModified < dbLastModified). > >

RE: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Peer, Oded
The cost of issuing an UPDATE that won’t update anything is compaction overhead. Since you stated it’s rare for rows to be updated then the overhead should be negligible. The easiest way to convert a milliseconds timestamp long value to microseconds is to multiply by 1000. From: Ali Akhtar [ma

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
Its rare for an existing record to have changes, but the etl job runs every hour, therefore it will send updates each time, regardless of whether there were changes or not. (I'm assuming that USING TIMESTAMP here will avoid the compaction overhead, since that will cause it to not run any updates u

RE: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Peer, Oded
USING TIMESTAMP doesn’t avoid compaction overhead. When you modify data the value is stored along with a timestamp indicating the timestamp of the value. Assume timestamp T1 < T2 and you stored value V with timestamp T2. Then you store V’ with timestamp T1. Now you have two values of V in the DB:

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
> I don’t understand the ETL use case and its relevance here. Can you provide more details? Basically, every 1 hour a job runs which queries an external API and gets some records. Then, I want to take only new or updated records, and insert / update them in cassandra. For records that are already

RE: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Peer, Oded
It will cause an overhead (compaction and read) as I described in the previous email. From: Ali Akhtar [mailto:ali.rac...@gmail.com] Sent: Wednesday, May 13, 2015 3:13 PM To: user@cassandra.apache.org Subject: Re: Updating only modified records (where lastModified < current date) > I don’t under

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
But your previous email talked about when T1 is different: > Assume timestamp T1 < T2 and you stored value V with timestamp T2. Then you store V’ with timestamp T1. What if you issue an update twice, but with the same timestamp? E.g if you ran: Update where foo=bar USING TIMESTAMP = 100

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Robert Wille
You could use lightweight transactions to update only if the record is newer. It doesn’t avoid the read, it just happens under the covers, so it’s not really going to be faster compared to a read-before-write pattern (which is an anti-pattern, BTW). It is probably the easiest way to avoid gettin

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
The 6k is only the starting value, its expected to scale up to ~200 million records. On Wed, May 13, 2015 at 5:44 PM, Robert Wille wrote: > You could use lightweight transactions to update only if the record is > newer. It doesn’t avoid the read, it just happens under the covers, so it’s > not

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Ali Akhtar
Can lightweight txns be used in a batch update? On Wed, May 13, 2015 at 5:48 PM, Ali Akhtar wrote: > The 6k is only the starting value, its expected to scale up to ~200 > million records. > > On Wed, May 13, 2015 at 5:44 PM, Robert Wille wrote: > >> You could use lightweight transactions to up

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Robert Wille
You probably shouldn’t use batch updates. Your records are probably unrelated to each other, and therefore there really is no reason to use batches. Use asynchronous queries to improve performance. executeAsync() is your friend. A common misconception is that batches will improve performance. Th

Re: Consistency Issues

2015-05-13 Thread Robert Wille
Timestamps have millisecond granularity. If you make multiple writes within the same millisecond, then the outcome is not deterministic. Also, make sure you are running ntp. Clock skew will manifest itself similarly. On May 13, 2015, at 3:47 AM, Jared Rodriguez mailto:jrodrig...@kitedesk.com>>

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Robert Coli
On Wed, May 13, 2015 at 4:37 AM, Peer, Oded wrote: > The cost of issuing an UPDATE that won’t update anything is compaction > overhead. Since you stated it’s rare for rows to be updated then the > overhead should be negligible. > It's also the cost of seeking into tables which contain the row f

Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Moshe Kranc
CQL is the future, and it provides a great high-level view of keyspaces. (I am drinking the Kool-Aid.) But, I believe every C* developer needs to also look at the table's internal structure, e.g., what do the column names actually look like. Only by keeping an eye on the physical structure can you

Re: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread DuyHai Doan
I think that you can still use cassandra-cli from 2.0.x to look into internal table structure. Of course you will see bytes instead of "readable" values but it's better than nothing. It's already the case for CQL collections when you're trying to decode them using cassandra-cli On Wed, May 13, 201

RE: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Moshe Kranc
Yes, cassandra-cli still works. But it also tells me that I should switch to CQL, and it doesn't want to display CQL3 tables. My question isn't how to get the info today – it's whether that info will still be available in the future. From: DuyHai Doan [mailto:doanduy...@gmail.com] Sent: Wedn

Re: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Jonathan Haddad
In Cassandra 3.0 there will be a massive rewrite of what an sstable even is, and the cli will be totally useless to inspect it. there won't be "column names" anymore, timestamps will be stored once per row (assuming they're the same) and a whole slew of other optimizations. If you want to look at

Count Number of Users in Cassandra column family?

2015-05-13 Thread Check Peck
I have a table like this in Cassandra- CREATE TABLE DATA_HOLDER (USER_ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (USER_ID, RECORD_NAME)); I want to count distinct USER_ID in my above table? Is there any way I can do that? My Cassandra version is: [cqlsh 4.1.1 | Cassandra

text partition key Bloom filters fp is 1 always, why?

2015-05-13 Thread Anishek Agarwal
Hello, I have a text partition key for one of the CF. The cfstats on that table seems to show that the bloom filter false positive ratio is always 1. Also the bloom filter is using very less space. Do bloom filters not work well with text partition keys ? I can assume this as it can no way detect