The cost of issuing an UPDATE that won’t update anything is compaction 
overhead. Since you stated it’s rare for rows to be updated then the overhead 
should be negligible.

The easiest way to convert a milliseconds timestamp long value to microseconds 
is to multiply by 1000.

From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 2:15 PM
To: user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified < current date)

Would TimeUnit.MILLISECONDS.toMicros(  myDate.getTime() ) work for producing 
the microsecond timestamp ?

On Wed, May 13, 2015 at 4:09 PM, Ali Akhtar 
<ali.rac...@gmail.com<mailto:ali.rac...@gmail.com>> wrote:
If specifying 'using' timestamp, the docs say to provide microseconds, but 
where are these microseconds obtained from? I have regular java.util.Date 
objects, I can get the time in milliseconds (i.e the unix timestamp), how would 
I convert that to microseconds?

On Wed, May 13, 2015 at 3:56 PM, Ali Akhtar 
<ali.rac...@gmail.com<mailto:ali.rac...@gmail.com>> wrote:
Thanks Peter, that's interesting. I didn't know of that option.

If updates don't create tombstones (and i'm already taking pains to ensure no 
nulls are present in queries), then is there no cost to just submitting an 
update for everything regardless of whether lastModified has changed?

Thanks.

On Wed, May 13, 2015 at 3:38 PM, Peer, Oded 
<oded.p...@rsa.com<mailto:oded.p...@rsa.com>> wrote:
You can use the “last modified” value as the TIMESTAMP for your UPDATE 
operation.
This way the values will only be updated if lastModified date > the 
lastModified you have in the DB.

Updates to values don’t create tombstones. Only deletes (either by executing 
delete, inserting a null value or by setting a TTL) create tombstones.


From: Ali Akhtar [mailto:ali.rac...@gmail.com<mailto:ali.rac...@gmail.com>]
Sent: Wednesday, May 13, 2015 1:27 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Updating only modified records (where lastModified < current date)

I'm running some ETL jobs, where the pattern is the following:

1- Get some records from an external API,

2- For each record, see if its lastModified date > the lastModified i have in 
db (or if I don't have that record in db)

3- If lastModified < dbLastModified, the item wasn't changed, ignore it. 
Otherwise, run an update query and update that record.

(It is rare for existing records to get updated, so I'm not that concerned 
about tombstones).

The problem however is, since I have to query each record's lastModified, one 
at a time, that's adding a major bottleneck to my job.

E.g if I have 6k records, I have to run a total of 6k 'select lastModified from 
myTable where id = ?' queries.

Is there a better way, am I doing anything wrong, etc? Any suggestions would be 
appreciated.

Thanks.



Reply via email to