Its rare for an existing record to have changes, but the etl job runs every
hour, therefore it will send updates each time, regardless of whether there
were changes or not.

(I'm assuming that USING TIMESTAMP here will avoid the compaction overhead,
since that will cause it to not run any updates unless the timestamp is
actually > last update timestamp?)

Also, is there a way to get the number of rows which were updated / ignored?

On Wed, May 13, 2015 at 4:37 PM, Peer, Oded <oded.p...@rsa.com> wrote:

>  The cost of issuing an UPDATE that won’t update anything is compaction
> overhead. Since you stated it’s rare for rows to be updated then the
> overhead should be negligible.
>
>
>
> The easiest way to convert a milliseconds timestamp long value to
> microseconds is to multiply by 1000.
>
>
>
> *From:* Ali Akhtar [mailto:ali.rac...@gmail.com]
> *Sent:* Wednesday, May 13, 2015 2:15 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Updating only modified records (where lastModified <
> current date)
>
>
>
> Would TimeUnit.MILLISECONDS.toMicros(  myDate.getTime() ) work for
> producing the microsecond timestamp ?
>
>
>
> On Wed, May 13, 2015 at 4:09 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:
>
> If specifying 'using' timestamp, the docs say to provide microseconds, but
> where are these microseconds obtained from? I have regular java.util.Date
> objects, I can get the time in milliseconds (i.e the unix timestamp), how
> would I convert that to microseconds?
>
>
>
> On Wed, May 13, 2015 at 3:56 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:
>
> Thanks Peter, that's interesting. I didn't know of that option.
>
>
>
> If updates don't create tombstones (and i'm already taking pains to ensure
> no nulls are present in queries), then is there no cost to just submitting
> an update for everything regardless of whether lastModified has changed?
>
>
>
> Thanks.
>
>
>
> On Wed, May 13, 2015 at 3:38 PM, Peer, Oded <oded.p...@rsa.com> wrote:
>
> You can use the “last modified” value as the TIMESTAMP for your UPDATE
> operation.
>
> This way the values will only be updated if lastModified date > the
> lastModified you have in the DB.
>
>
>
> Updates to values don’t create tombstones. Only deletes (either by
> executing delete, inserting a null value or by setting a TTL) create
> tombstones.
>
>
>
>
>
> *From:* Ali Akhtar [mailto:ali.rac...@gmail.com]
> *Sent:* Wednesday, May 13, 2015 1:27 PM
> *To:* user@cassandra.apache.org
> *Subject:* Updating only modified records (where lastModified < current
> date)
>
>
>
> I'm running some ETL jobs, where the pattern is the following:
>
>
>
> 1- Get some records from an external API,
>
>
>
> 2- For each record, see if its lastModified date > the lastModified i have
> in db (or if I don't have that record in db)
>
>
>
> 3- If lastModified < dbLastModified, the item wasn't changed, ignore it.
> Otherwise, run an update query and update that record.
>
>
>
> (It is rare for existing records to get updated, so I'm not that concerned
> about tombstones).
>
>
>
> The problem however is, since I have to query each record's lastModified,
> one at a time, that's adding a major bottleneck to my job.
>
>
>
> E.g if I have 6k records, I have to run a total of 6k 'select lastModified
> from myTable where id = ?' queries.
>
>
>
> Is there a better way, am I doing anything wrong, etc? Any suggestions
> would be appreciated.
>
>
>
> Thanks.
>
>
>
>
>
>
>

Reply via email to