Pagination and timeouts

2017-03-27 Thread Tom van den Berge
I have a table with some 1M rows, and I would like to get the partition key of each row. Using the java driver (2.1.9), I'm executing the query select distinct key from table; The result set is paginated automatically. My C* cluster has two datacenters, and when I run this query using

Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Tom van den Berge
> > Is text the most appropriate data type to store JSON that contain couple > of dozen lines ? > It sure is the simplest way to store JSON. The query requirement is "where executedby = ?”. > Since executedby is a timeuuid, I guess you don't want to query a single record, since that would

Re: Unexplainably large reported partition sizes

2016-03-10 Thread Tom van den Berge
h bug/jira this was? I have not been able to >>> find it. >>> I'm using 2.1.9. >>> >> >> https://issues.apache.org/jira/browse/CASSANDRA-7953 >> >> Rob may have a different one, but I've something similar from this issue. >> Fixed in 2.1.12. &g

Re: Unexplainably large reported partition sizes

2016-03-07 Thread Tom van den Berge
Hi Bryan, > Do you use any collections on this column family? We've had issues in the > past with unexpectedly large partitions reported on data models with > collections, which can also generate tons of tombstones on UPDATE ( > https://issues.apache.org/jira/browse/CASSANDRA-10547) > I've

Re: Unexplainably large reported partition sizes

2016-03-07 Thread Tom van den Berge
Hi Rob, The reason I didn't dump the table with sstable2json is that I didn't think of it ;) I just used it, and it looks very much like the "avalanche of tombstones" bug you are describing! I took one of the three sstables containing the key, and it resulted in a 4.75 million-line json file, of

Re: Unexplainably large reported partition sizes

2016-03-06 Thread Tom van den Berge
in the same partition with different TTL values ? > > On Sat, Mar 5, 2016 at 7:16 PM, Tom van den Berge <t...@drillster.com> > wrote: > >> I don't think compression can be the cause of the difference, because of >> two reasons: >> >> 1) The partition size I calculated m

Re: Unexplainably large reported partition sizes

2016-03-05 Thread Tom van den Berge
com> wrote: > On Fri, Mar 4, 2016 at 5:56 AM, Tom van den Berge <t...@drillster.com> > wrote: > >> Compacting large partition >> drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes) >> >> This means that this single partition is about 1.

Unexplainably large reported partition sizes

2016-03-04 Thread Tom van den Berge
Hi, I'm seeing warnings in my logs about compacting large partitions, e.g.: Compacting large partition drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes) This means that this single partition is about 1.4GB large. This is much larger that it can possibly be, because of two

Re: Removed node is not completely removed

2015-10-15 Thread Tom van den Berge
Thanks Sebastian, a restart solved the problem! On Wed, Oct 14, 2015 at 3:46 PM, Sebastian Estevez < sebastian.este...@datastax.com> wrote: > We still keep endpoints in memory. Not sure how you git to this state but > try a rolling restart. > On Oct 14, 2015 9:43 AM, &qu

Re: Removed node is not completely removed

2015-10-14 Thread Tom van den Berge
pace uses LocalStrategy: each node has its > own set of system tables. -ml > > On Wed, Oct 14, 2015 at 9:17 AM, Tom van den Berge < > tom.vandenbe...@gmail.com> wrote: > >> Hi Carlos, >> >> I'm using 2.1.6. The mysterious node is not in the peers table.

Re: Removed node is not completely removed

2015-10-14 Thread Tom van den Berge
Love your data > > rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo > <http://linkedin.com/in/carlosjuzarterolo>* > Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 > www.pythian.com > > On Wed, Oct 14, 2015 at 12:26 PM, Tom van den B

Re: Do vnodes need more memory?

2015-09-24 Thread Tom van den Berge
On Thu, Sep 24, 2015 at 12:45 AM, Robert Coli <rc...@eventbrite.com> wrote: > On Wed, Sep 23, 2015 at 7:09 AM, Tom van den Berge < > tom.vandenbe...@gmail.com> wrote: > >> So it seems that Cassandra simply doesn't have enough memory. I'm trying >> to understand

Re: Do vnodes need more memory?

2015-09-23 Thread Tom van den Berge
lt? How much ram? > > Also, can you run this tool and send a minute worth of thread info: > > wget > https://bintray.com/artifact/download/aragozin/generic/sjk-plus-0.3.6.jar > java -jar sjk-plus-0.3.6.jar ttop -s localhost:7199 -n 30 -o CPU > On Sep 23, 2015 7:09 AM,

Do vnodes need more memory?

2015-09-23 Thread Tom van den Berge
I have two data centers, each with the same number of nodes, same hardware (CPUs, memory), Cassandra version (2.1.6), replication factory, etc. The only difference it that one data center uses vnodes, and the other doesn't. The non-vnode DC works fine (and has been for a long time) under

Secondary index is causing high CPU load

2015-09-15 Thread Tom van den Berge
; in the cfstats for the index go up with almost 20! When doing the same query on one of my "good" nodes, it only increases with a small number, as I would expect. Could it be that the use of vnodes is causing these problems? Regards, Tom On Mon, Sep 14, 2015 at 8:09 PM, Tom va

Extremely high CPU load in new data center

2015-09-14 Thread Tom van den Berge
I have a DC of 4 nodes that must be expanded to accommodate an expected growth in data. Since the DC is not using vnodes, we have decided to set up a new DC with vnodes enabled, start using the new DC, and decommission the old DC. Both DCs have 4 nodes. The idea is to add additional nodes to the

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-09 Thread Tom van den Berge
> > > I've learned from experience that the node immediately joins the cluster, >> and starts accepting reads (from other DCs) for the range it owns. > > > This seems to be the incorrect assumption at the heart of the confusion. > You "should" be able to prevent this behavior entirely via correct

Re: How to prevent queries being routed to new DC?

2015-09-08 Thread Tom van den Berge
479 > > Thanks > Anuj > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > -- > *From*:"Tom van den Berge" <t...@drillster.com> > *Date*:Tue, 8 Sep, 2015 at 1:31 am > *Subject*:Re:

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread Tom van den Berge
the Atlantic takes a lot more time :( > > kind regards, > Christian > > PS: I would love to see the results, if you perform any tests on the > write-survey. Please share it here on the mailing list :-) > > > > On Mon, Sep 7, 2015 at 11:10 PM, Tom van den Berge <

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tom van den Berge
Just to be sure: can this bug result in a 0-row result while it should be > 0 ? Op 8 sep. 2015 6:29 PM schreef "Tyler Hobbs" <ty...@datastax.com>: > See https://issues.apache.org/jira/browse/CASSANDRA-9753 > > On Tue, Sep 8, 2015 at 10:22 AM, Tom van den Berge < >

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread Tom van den Berge
> Running nodetool rebuild on a node that was started with join_ring=false >> does not work, unfortunately. The nodetool command returns immediately, >> after a message appears in the log that the streaming of data has started. >> After that, nothing happens. > > > Per driftx, the author of

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tom van den Berge
Nate, I've disabled it, and it's been running for about an hour now without problems, while before, the problem occurred roughly every few minutes. I guess it's safe to say that this proves that CASSANDRA-9753 is the cause of the problem.

Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tom van den Berge
I've been bugging you a few times, but now I've got trace data for a query with LOCAL_QUORUM that is being sent to a remove data center. The setup is as follows: NetworkTopologyStrategy: {"DC1":"1","DC2":"2"} Both DC1 and DC2 have 2 nodes. In DC2, one node is currently being rebuilt, and

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-07 Thread Tom van den Berge
but not serving reads. I have not tested it yet, but I > think it should work. > > Also the manual join mentioned in CASSANDRA-9667 sounds very interesting. > > kind regards, > Christian > > On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge <t...@drillster.com> > wrote

Re: How to prevent queries being routed to new DC?

2015-09-07 Thread Tom van den Berge
NetworkTopologyStrategy On Mon, Sep 7, 2015 at 4:39 PM, Ryan Svihla <r...@foundev.pro> wrote: > What's your keyspace replication strategy? > > On Thu, Sep 3, 2015 at 3:16 PM Tom van den Berge < > tom.vandenbe...@gmail.com> wrote: > >> Thanks for your help s

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-07 Thread Tom van den Berge
Coli <rc...@eventbrite.com> wrote: > On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <t...@drillster.com> > wrote: > >> Wouldn't it be far more efficient if a node that is rebuilding itself is >> responsible for not accepting reads until the rebuild is complete?

Re: How to prevent queries being routed to new DC?

2015-09-03 Thread Tom van den Berge
oum levels. > > > On Thu, Sep 3, 2015 at 11:53 AM, Tom van den Berge < > tom.vandenbe...@gmail.com> wrote: > >> Hi Bryan, >> >> I'm using the PropertyFileSnitch, and it contains entries for all nodes >> in the old DC, and all nodes in the new DC. The rep

How to prevent queries being routed to new DC?

2015-09-03 Thread Tom van den Berge
I want to start using vnodes in my cluster. To do so, I've set up a new data center with the same number of nodes as the existing one, as described in http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configVnodesProduction_t.html. The new DC is in the same physical location as the

Re: How to prevent queries being routed to new DC?

2015-09-03 Thread Tom van den Berge
ng, can you verify that they show up under a new DC and not as part of > the old? > > --Bryan > > On Thu, Sep 3, 2015 at 11:27 AM, Tom van den Berge < > tom.vandenbe...@gmail.com> wrote: > >> I want to start using vnodes in my cluster. To do so, I've set up

Re: How to prevent queries being routed to new DC?

2015-09-03 Thread Tom van den Berge
Thanks for your help so far! I have some problems trying to understand the jira mentioned by Rob :( I'm currently trying to set up the first node in the new DC with auto_bootstrap = true. The node then becomes visible with status "joining", which (hopefully) prevents other DCs from sending

Fwd: MarshalException after upgrading to 2.1.6

2015-06-11 Thread Tom van den Berge
I've upgraded a node from 2.0.10 to 2.1.6. Before taking down the node, I've run nodetool upgradesstables and nodetool scrub. When starting up the node with 2.1.6, I'm getting a MarshalException (stacktrace included below). For some reason, it seems that C* is trying to convert a text value from

Re: MarshalException after upgrading to 2.1.6

2015-06-11 Thread Tom van den Berge
at 9:23 AM, Tom van den Berge t...@drillster.com wrote: I've upgraded a node from 2.0.10 to 2.1.6. Before taking down the node, I've run nodetool upgradesstables and nodetool scrub. When starting up the node with 2.1.6, I'm getting a MarshalException (stacktrace included below). For some

Re: Is it possible to bootstrap the 1st node of a new DC?

2014-09-12 Thread Tom van den Berge
bootstrapped? Tom On Thu, Sep 11, 2014 at 11:10 PM, Tom van den Berge t...@drillster.com wrote: Thanks, Rob. I actually tried using LOCAL_ONE instead of ONE, but I still saw this problem. Maybe I missed some queries when updating to LOCAL_ONE. Anyway, it's good to know that this is supposed

Is it possible to bootstrap the 1st node of a new DC?

2014-09-11 Thread Tom van den Berge
When setting up a new (additional) data center, the documentation tells us to use nodetool rebuild -- old dc to fill up the node(s) in the new dc, and to disable auto_bootstrap. I'm wondering if it is possible to fill the node with auto_bootstrap=true instead of a nodetool rebuild command. If so,

Re: Is it possible to bootstrap the 1st node of a new DC?

2014-09-11 Thread Tom van den Berge
11, 2014 at 1:18 PM, Tom van den Berge t...@drillster.com wrote: When setting up a new (additional) data center, the documentation tells us to use nodetool rebuild -- old dc to fill up the node(s) in the new dc, and to disable auto_bootstrap. I'm wondering if it is possible to fill the node

Node being rebuilt receives read requests

2014-09-10 Thread Tom van den Berge
I have a datacenter with a single node, and I want to start using vnodes. I have followed the instructions ( http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html), and set up a new node in a new datacenter (auto_bootstrap=false, seed=node in old dc,

Are writes to indexes performed asynchronously?

2014-06-19 Thread Tom van den Berge
Hi, I have a column family with a secondary index on one of its columns. I noticed that when I write a row to the column family, and immediately query that row through the secondary index, every now and then it won't give any results. Could it be that Cassandra performs the write to the internal

Re: Migration 1.2.14 to 2.0.8 causes Tried to create duplicate hard link at startup

2014-06-19 Thread Tom van den Berge
It turns out this is caused by an earlier, failed attempt to upgrade. Removing all pre-sstablemetamigration snapshot directories solved the issue. Credits to Markus Eriksson. On Wed, Jun 11, 2014 at 9:42 AM, Tom van den Berge t...@drillster.com wrote: No, unfortunately I haven't. On Tue

Re: Migration 1.2.14 to 2.0.8 causes Tried to create duplicate hard link at startup

2014-06-11 Thread Tom van den Berge
No, unfortunately I haven't. On Tue, Jun 10, 2014 at 5:35 PM, Chris Burroughs chris.burrou...@gmail.com wrote: Were you able to solve or work around this problem? On 06/05/2014 11:47 AM, Tom van den Berge wrote: Hi, I'm trying to migrate a development cluster from 1.2.14 to 2.0.8

Migration 1.2.14 to 2.0.8 causes Tried to create duplicate hard link at startup

2014-06-05 Thread Tom van den Berge
Hi, I'm trying to migrate a development cluster from 1.2.14 to 2.0.8. When starting up 2.0.8, I'm seeing the following error in the logs: INFO 17:40:25,405 Snapshotting drillster, Account to pre-sstablemetamigration ERROR 17:40:25,407 Exception encountered during startup

StatusLogger output help

2014-03-28 Thread Tom van den Berge
Hi, In my cassandra logs, I see a lot of StatusLogger output lines. I'm trying to understand why this is logged, and how to interpret the output. Maybe someone can point me to some documentation on this particular logging aspect? I would like to know what is triggering the StatusLogger.java to

Help on StatusLogger output?

2014-03-20 Thread Tom van den Berge
Hi, In my cassandra logs, I see a lot of StatusLogger output lines. I'm trying to understand why this is logged, and how to interpret the output. Maybe someone can point me to some documentation on this particular logging aspect? I would like to know what is triggering the StatusLogger.java to

Re: OutOfMemory Java Heap Space error on startup...

2013-12-04 Thread Tom van den Berge
To start up your node again, you could delete the stored key caches ( /var/lib/cassandra/saved_caches/*). Regards, Tom On Wed, Dec 4, 2013 at 7:46 PM, Krishna Chaitanya bnsk1990r...@gmail.comwrote: Hey Nate, Thanks for the reply. The link was really good...!!! Looking forward to

Re: How to measure data transfer between data centers?

2013-12-04 Thread Tom van den Berge
/switch/fancy-network-gear level. On 12/03/2013 06:25 AM, Tom van den Berge wrote: Is there a way to know how much data is transferred between two nodes, or more specifically, between two data centers? I'm especially interested in how much data is being replicated from one data center to another

How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Tom van den Berge
Hi, Is there a way to monitor the progress of a hinted handoff task? I found the following two mbeans providing some info: org.apache.cassandra.internal:type=HintedHandoff, which tells me that there is 1 active task, and org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),

How to measure data transfer between data centers?

2013-12-03 Thread Tom van den Berge
Is there a way to know how much data is transferred between two nodes, or more specifically, between two data centers? I'm especially interested in how much data is being replicated from one data center to another, to know how much of the available bandwidth is used. Thanks, Tom

Re: How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Tom van den Berge
handoff of {} rows to endpoint {} Thanks Rahul On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.comwrote: Hi, Is there a way to monitor the progress of a hinted handoff task? I found the following two mbeans providing some info: org.apache.cassandra.internal:type

Re: How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Tom van den Berge
On Tue, Dec 3, 2013 at 6:41 PM, Tom van den Berge t...@drillster.comwrote: Hi Rahul, Thanks for your reply. I have never seen message like Timed out replaying hints to..., which is a good thing then, I suppose ;) Normally, I do see the Finished hinted handoff... log message. However, every

What is listEndpointsPendingHints?

2013-11-26 Thread Tom van den Berge
When I run the operation listEndpointsPendingHints on the mbean org.apache.cassandra.db:type=HintedHandoffManager, I'm getting ( 126879603237190600081737151857243914981 ) It suggests that there are pending hints, but the org.apache.cassandra.internal:type=HintedHandoff mbean provides these

Re: Managing index tables

2013-11-05 Thread Tom van den Berge
Hi Thomas, I understand your concerns about ensuring the integrity of your data when having to maintain the indexes yourself. In some situations, using Cassandra's built in secondary indexes is more efficient -- when many rows contained the indexed value. Maybe your permissions fall in this

Re: filter using timeuuid column type

2013-11-05 Thread Tom van den Berge
This is because time2 is not part of the primary key. Only the primary key column(s) can be queried with and . Secondary indexes (like your timeuuid_test2_idx) can only be queried with the = operator. Maybe you can make time2 also part of your primary key? Good luck, Tom On Mon, Nov 4, 2013

Re: Check out if Cassandra ready

2013-11-01 Thread Tom van den Berge
I recommend using CassandraUnit (https://github.com/jsevellec/cassandra-unit). It makes using Cassandra in unit tests quite easy. It allows you to start an embedded Cassandra synchronously with a single simple method call, optionally load your schema and initial data, and you're ready to start

Re: Disappearing index data.

2013-10-09 Thread Tom van den Berge
this: org.apache.cassandra.db:type=IndexColumnFamilies,keyspace=KS,columnfamily=CF.IDX M. W dniu 07.10.2013 15:22, Tom van den Berge pisze: On a 2-node cluster with replication factor 2, I have a column family with an index on one of the columns. Every now and then, I notice

Disappearing index data.

2013-10-07 Thread Tom van den Berge
On a 2-node cluster with replication factor 2, I have a column family with an index on one of the columns. Every now and then, I notice that a lookup of the record through the index on node 1 produces the record, but the same lookup on node 2 does not! If I do a lookup by row key, the record is

Re: Disappearing index data.

2013-10-07 Thread Tom van den Berge
one, which is responsible for storing index data. MBean you should look for looks like this: org.apache.cassandra.db:type=**IndexColumnFamilies,keyspace=** KS,columnfamily=CF.IDX M. W dniu 07.10.2013 15:22, Tom van den Berge pisze: On a 2-node cluster with replication factor 2, I have

HintedHandoff process does not finish

2013-09-27 Thread Tom van den Berge
Hi, One one of my nodes, the (storage) load increased dramatically (doubled), within one or two hours. The hints column family was causing the growth. I noticed one HintedHandoff process that was started some two hours ago, but hadn't finished. Normally, these processes take only a few seconds,

Re: is there a no disk storage mode ?

2011-12-01 Thread Tom van den Berge
Hi Dominique, I don't think there is a way to run cassandra without disk storage. But running it embedded can be very useful for unit testing. I'm using cassandra-unit (https://github.com/jsevellec/cassandra-unit) to integrate it in my tests. You don't need to configure any file paths; it