Hello,
I have a text partition key for one of the CF. The cfstats on that table
seems to show that the bloom filter false positive ratio is always 1. Also
the bloom filter is using very less space.
Do bloom filters not work well with text partition keys ? I can assume this
as it can no way detect
I have a table like this in Cassandra-
CREATE TABLE DATA_HOLDER (USER_ID TEXT, RECORD_NAME TEXT, RECORD_VALUE
BLOB, PRIMARY KEY (USER_ID, RECORD_NAME));
I want to count distinct USER_ID in my above table? Is there any way I can
do that?
My Cassandra version is:
[cqlsh 4.1.1 | Cassandra
In Cassandra 3.0 there will be a massive rewrite of what an sstable
even is, and the cli will be totally useless to inspect it. there
won't be "column names" anymore, timestamps will be stored once per
row (assuming they're the same) and a whole slew of other
optimizations. If you want to look at
Yes, cassandra-cli still works. But it also tells me that I should switch to
CQL, and it doesn't want to display CQL3 tables. My question isn't how to get
the info today – it's whether that info will still be available in the future.
From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: Wedn
I think that you can still use cassandra-cli from 2.0.x to look into
internal table structure. Of course you will see bytes instead of
"readable" values but it's better than nothing. It's already the case for
CQL collections when you're trying to decode them using cassandra-cli
On Wed, May 13, 201
CQL is the future, and it provides a great high-level view of keyspaces. (I
am drinking the Kool-Aid.) But, I believe every C* developer needs to also
look at the table's internal structure, e.g., what do the column names
actually look like. Only by keeping an eye on the physical structure can
you
On Wed, May 13, 2015 at 4:37 AM, Peer, Oded wrote:
> The cost of issuing an UPDATE that won’t update anything is compaction
> overhead. Since you stated it’s rare for rows to be updated then the
> overhead should be negligible.
>
It's also the cost of seeking into tables which contain the row f
Timestamps have millisecond granularity. If you make multiple writes within the
same millisecond, then the outcome is not deterministic.
Also, make sure you are running ntp. Clock skew will manifest itself similarly.
On May 13, 2015, at 3:47 AM, Jared Rodriguez
mailto:jrodrig...@kitedesk.com>>
You probably shouldn’t use batch updates. Your records are probably unrelated
to each other, and therefore there really is no reason to use batches. Use
asynchronous queries to improve performance. executeAsync() is your friend.
A common misconception is that batches will improve performance. Th
Can lightweight txns be used in a batch update?
On Wed, May 13, 2015 at 5:48 PM, Ali Akhtar wrote:
> The 6k is only the starting value, its expected to scale up to ~200
> million records.
>
> On Wed, May 13, 2015 at 5:44 PM, Robert Wille wrote:
>
>> You could use lightweight transactions to up
The 6k is only the starting value, its expected to scale up to ~200 million
records.
On Wed, May 13, 2015 at 5:44 PM, Robert Wille wrote:
> You could use lightweight transactions to update only if the record is
> newer. It doesn’t avoid the read, it just happens under the covers, so it’s
> not
You could use lightweight transactions to update only if the record is newer.
It doesn’t avoid the read, it just happens under the covers, so it’s not really
going to be faster compared to a read-before-write pattern (which is an
anti-pattern, BTW). It is probably the easiest way to avoid gettin
But your previous email talked about when T1 is different:
> Assume timestamp T1 < T2 and you stored value V with timestamp T2. Then
you store V’ with timestamp T1.
What if you issue an update twice, but with the same timestamp? E.g if you
ran:
Update where foo=bar USING TIMESTAMP = 100
It will cause an overhead (compaction and read) as I described in the previous
email.
From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 3:13 PM
To: user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified < current date)
> I don’t under
> I don’t understand the ETL use case and its relevance here. Can you
provide more details?
Basically, every 1 hour a job runs which queries an external API and gets
some records. Then, I want to take only new or updated records, and insert
/ update them in cassandra. For records that are already
USING TIMESTAMP doesn’t avoid compaction overhead.
When you modify data the value is stored along with a timestamp indicating the
timestamp of the value.
Assume timestamp T1 < T2 and you stored value V with timestamp T2. Then you
store V’ with timestamp T1.
Now you have two values of V in the DB:
Its rare for an existing record to have changes, but the etl job runs every
hour, therefore it will send updates each time, regardless of whether there
were changes or not.
(I'm assuming that USING TIMESTAMP here will avoid the compaction overhead,
since that will cause it to not run any updates u
The cost of issuing an UPDATE that won’t update anything is compaction
overhead. Since you stated it’s rare for rows to be updated then the overhead
should be negligible.
The easiest way to convert a milliseconds timestamp long value to microseconds
is to multiply by 1000.
From: Ali Akhtar [ma
Is there a way in the java driver, to get the number of rows that an update
was applied to?
On Wed, May 13, 2015 at 4:33 PM, Ali Akhtar wrote:
> Thanks. So supplying the timestamp with the update (via using) should fix
> that, right? (By skipping updates where lastModified < dbLastModified).
>
>
Thanks. So supplying the timestamp with the update (via using) should fix
that, right? (By skipping updates where lastModified < dbLastModified).
I'm currently doing TimeUnit.MILLISECONDS.toMicros( myDate.getTime() ) and
that has worked for inserts, however how do I verify that future updates
are
While updates don't create tombstones, overwrites create a similar
performance penalty at the read phase. That key will need to be fetched
from every SSTable where it resides so the "most recent" column can be
returned.
On Wed, May 13, 2015 at 6:38 AM, Peer, Oded wrote:
> You can use the “la
Would TimeUnit.MILLISECONDS.toMicros( myDate.getTime() ) work for
producing the microsecond timestamp ?
On Wed, May 13, 2015 at 4:09 PM, Ali Akhtar wrote:
> If specifying 'using' timestamp, the docs say to provide microseconds, but
> where are these microseconds obtained from? I have regular ja
If specifying 'using' timestamp, the docs say to provide microseconds, but
where are these microseconds obtained from? I have regular java.util.Date
objects, I can get the time in milliseconds (i.e the unix timestamp), how
would I convert that to microseconds?
On Wed, May 13, 2015 at 3:56 PM, Ali
Sorry, wrong thread. Disregard the above
On Wed, May 13, 2015 at 4:08 PM, Ali Akhtar wrote:
> If specifying 'using' timestamp, the docs say to provide microseconds, but
> where are these microseconds obtained from? I have regular java.util.Date
> objects, I can get the time in milliseconds (i.e
If specifying 'using' timestamp, the docs say to provide microseconds, but
where are these microseconds obtained from? I have regular java.util.Date
objects, I can get the time in milliseconds (i.e the unix timestamp), how
would I convert that to microseconds?
On Wed, May 13, 2015 at 3:45 PM, Peer
Thanks Peter, that's interesting. I didn't know of that option.
If updates don't create tombstones (and i'm already taking pains to ensure
no nulls are present in queries), then is there no cost to just submitting
an update for everything regardless of whether lastModified has changed?
Thanks.
O
Under the assumption that when you update the columns you also update the TTL
for the columns then a tombstone won't be created for those columns.
Remember that TTL is set on columns (or "cells"), not on rows, so your
description of updating a row is slightly misleading. If every query updates
d
You can use the “last modified” value as the TIMESTAMP for your UPDATE
operation.
This way the values will only be updated if lastModified date > the
lastModified you have in the DB.
Updates to values don’t create tombstones. Only deletes (either by executing
delete, inserting a null value or b
Quick Question,
Our team is under much debate, we are trying to find out if an Update on a row
with a TTL will create a tombstone.
E.G
We have one row with a TTL, if we keep "updating" that row before the TTL is
hit, will a tombstone be created.
I believe it will, but want to confirm.
So if t
I'm running some ETL jobs, where the pattern is the following:
1- Get some records from an external API,
2- For each record, see if its lastModified date > the lastModified i have
in db (or if I don't have that record in db)
3- If lastModified < dbLastModified, the item wasn't changed, ignore it
Thanks for the feedback. We have dug in deeper and upgraded to Cassandra
2.0.14 and are seeing the same issue. What appears to be happening is that
if a record is initially written, then the first read is fine. But if we
immediately update that record with a second write, that then the second
re
31 matches
Mail list logo