Is there a way in the java driver, to get the number of rows that an update was applied to?
On Wed, May 13, 2015 at 4:33 PM, Ali Akhtar <ali.rac...@gmail.com> wrote: > Thanks. So supplying the timestamp with the update (via using) should fix > that, right? (By skipping updates where lastModified < dbLastModified). > > I'm currently doing TimeUnit.MILLISECONDS.toMicros( myDate.getTime() ) > and that has worked for inserts, however how do I verify that future > updates are ignored and aren't run again? > > On Wed, May 13, 2015 at 4:29 PM, Ken Hancock <ken.hanc...@schange.com> > wrote: > >> While updates don't create tombstones, overwrites create a similar >> performance penalty at the read phase. That key will need to be fetched >> from every SSTable where it resides so the "most recent" column can be >> returned. >> >> >> >> >> On Wed, May 13, 2015 at 6:38 AM, Peer, Oded <oded.p...@rsa.com> wrote: >> >>> You can use the “last modified” value as the TIMESTAMP for your UPDATE >>> operation. >>> >>> This way the values will only be updated if lastModified date > the >>> lastModified you have in the DB. >>> >>> >>> >>> Updates to values don’t create tombstones. Only deletes (either by >>> executing delete, inserting a null value or by setting a TTL) create >>> tombstones. >>> >>> >>> >>> >>> >>> *From:* Ali Akhtar [mailto:ali.rac...@gmail.com] >>> *Sent:* Wednesday, May 13, 2015 1:27 PM >>> *To:* user@cassandra.apache.org >>> *Subject:* Updating only modified records (where lastModified < current >>> date) >>> >>> >>> >>> I'm running some ETL jobs, where the pattern is the following: >>> >>> >>> >>> 1- Get some records from an external API, >>> >>> >>> >>> 2- For each record, see if its lastModified date > the lastModified i >>> have in db (or if I don't have that record in db) >>> >>> >>> >>> 3- If lastModified < dbLastModified, the item wasn't changed, ignore it. >>> Otherwise, run an update query and update that record. >>> >>> >>> >>> (It is rare for existing records to get updated, so I'm not that >>> concerned about tombstones). >>> >>> >>> >>> The problem however is, since I have to query each record's >>> lastModified, one at a time, that's adding a major bottleneck to my job. >>> >>> >>> >>> E.g if I have 6k records, I have to run a total of 6k 'select >>> lastModified from myTable where id = ?' queries. >>> >>> >>> >>> Is there a better way, am I doing anything wrong, etc? Any suggestions >>> would be appreciated. >>> >>> >>> >>> Thanks. >>> >> >> >> >> >> >> >