what is the best data model for time series of small data chunks...

2012-07-10 Thread Roland Hänel
Hi, I have an application that consists of multiple (possible 1000's) of measurement series, and each measurement series generates a small amount of data output (only about 500 bytes) every 10 seconds. This time series of data should be stored in Cassandra in a fashion that both read access is pos

order of output in get_slice

2011-11-07 Thread Roland Hänel
Does a call to list get_slice(binary key, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) give us any guarantees on the order of the returned list? I understand that when the predicate actually contains a sliceRange, then the order _is_ guaranteed to be i

Re: Detailed behavior of insert() operation?

2010-04-30 Thread Roland Hänel
Here is the ticket: https://issues.apache.org/jira/browse/CASSANDRA-1039 Thanks, Roland 2010/4/29 Jonathan Ellis > 2010/4/29 Roland Hänel : > > Imagine the following rule: if we are in doubt whether to repair a column > > with timestamp T (because two values X and Y are pre

Re: Detailed behavior of insert() operation?

2010-04-29 Thread Roland Hänel
only turned upside down. This could be prevented by introduction of a tie-breaker. Imagine the following rule: if we are in doubt whether to repair a column with timestamp T (because two values X and Y are present within the cluster, both at timestamp T), then we always repair towards X if md

Re: Detailed behavior of insert() operation?

2010-04-28 Thread Roland Hänel
imestamps, what value is elected for repair? The first one that the node got in the read request? If we make that deterministic, we could avoid this scenario, right? -Roland 2010/4/28 Jonathan Ellis > 2010/4/28 Roland Hänel : > > Two clients insert the same key/colum with different val

Re: Cassandra cluster runs into OOM when bulk loading data

2010-04-28 Thread Roland Hänel
ON-STAGE0 0 293735704 >>> MESSAGE-STREAMING-POOL0 0 6 >>> LOAD-BALANCER-STAGE 0 0 0 >>> FLUSH-SORTER-POOL 0 0 0 >>> MEMTABLE-PO

Detailed behavior of insert() operation?

2010-04-28 Thread Roland Hänel
Does Cassandra make any guarantees on the outcome of a scenario like this: Two clients insert the same key/colum with different values at the same time: client A does insert(keyspace, key_1, column_name_1, value_A, timestamp_1, consistency_level.QUORUM) client B does insert(keyspace, key_1,

How to generate 'unique' identifiers for use in Cassandra

2010-04-26 Thread Roland Hänel
Typically, in the SQL world we use things like AUTO_INCREMENT columns that let us create a unique key automatically if a row is inserted into a table. What do you guys usually do to create identifiers for use in Cassandra? Do we only rely on "currentTimeMills() + random()" to create something tha

Re: Can Cassandra make real use of several DataFileDirectories?

2010-04-26 Thread Roland Hänel
completely blocked for 8ms. If you handle the disks independently, only the disk containing the file is blocked. RAID0 has its advantages of course. Streaming reads/writes (e.g. during a compaction) will be extremely fast. -Roland 2010/4/26 Paul Prescod > 2010/4/26 Roland Hänel : > >

Re: Cassandra cluster runs into OOM when bulk loading data

2010-04-26 Thread Roland Hänel
Thanks Chris 2010/4/26 Chris Goffinet > Upgrade to b20 of Sun's version of JVM. This OOM might be related to > LinkedBlockQueue issues that were fixed. > > -Chris > > > 2010/4/26 Roland Hänel > >> Cassandra Version 0.6.1 >> OpenJDK Server VM (build

Re: Cassandra cluster runs into OOM when bulk loading data

2010-04-26 Thread Roland Hänel
time. Thanks, Roland 2010/4/26 Chris Goffinet > Which version of Cassandra? > Which version of Java JVM are you using? > What do your I/O stats look like when bulk importing? > When you run `nodeprobe -host tpstats` is any thread pool backing up > during the import? > > -Ch

Re: Can Cassandra make real use of several DataFileDirectories?

2010-04-26 Thread Roland Hänel
Ryan King > 2010/4/26 Roland Hänel : > > Hm... I understand that RAID0 would help to create a bigger pool for > > compactions. However, it might impact read performance: if I have several > > CF's (with their SSTables), random read requests for the CF files that >

Cassandra cluster runs into OOM when bulk loading data

2010-04-26 Thread Roland Hänel
I have a cluster of 5 machines building a Cassandra datastore, and I load bulk data into this using the Java Thrift API. The first ~250GB runs fine, then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using and row or index caches, and since I only have 5 CF's and some 2,5 GB of

Re: Can Cassandra make real use of several DataFileDirectories?

2010-04-26 Thread Roland Hänel
t; I would recommend using RAID-0 rather that multiple data directories. > >> > >> -ryan > >> > >> 2010/4/26 Roland Hänel : > >>> I have a configuration like this: > >>> > >>> > >>> /storage01/cassandra/data >

Re: Can Cassandra make real use of several DataFileDirectories?

2010-04-26 Thread Roland Hänel
} > else > { > currentIndex = maxDiskIndex; > } > return dataFileDirectory; > } > > So, DataFileDirectories means multiple disks or disk-partitions. > I think your storage01, storage02 and storage03 are in same disk or disk > partit

Re: when i use the OrderPreservingPartition, the load is very imbalance

2010-04-26 Thread Roland Hänel
sorry, if specifying the token manually, use: bin/nodetool -h move 2010/4/26 Roland Hänel > 1) you can re-balance a node with > > bin/nodetool -h token [] > > specify a new token manually or let the system guess one. > > 2) take a look into your system.log to fi

Re: when i use the OrderPreservingPartition, the load is very imbalance

2010-04-26 Thread Roland Hänel
1) you can re-balance a node with bin/nodetool -h token [] specify a new token manually or let the system guess one. 2) take a look into your system.log to find out why your nodes are dying. 2010/4/26 刘兵兵 > i do some INSERT ,because i will do some scan operations, i use the > OrderPres

Can Cassandra make real use of several DataFileDirectories?

2010-04-26 Thread Roland Hänel
I have a configuration like this: /storage01/cassandra/data /storage02/cassandra/data /storage03/cassandra/data After loading a big chunk of data into cassandra, I end up wich some 70GB in the first directory, and only about 10GB in the second and third one. All rows are q

Row key: string or binary (byte[])?

2010-04-15 Thread Roland Hänel
Is there any effort ongoing to make the row key a binary (byte[]) instead of a string? In the current cassandra.thrift file (0.6.0), I find: const string VERSION = "2.1.0" [...] struct KeySlice { 1: required *string* key, 2: required list columns, } while on the current (?) SVN https://sv

Re: Ring management and load balance

2010-03-27 Thread Roland Hänel
ig deal to integrate in JMX if not already there. Roland 26.03.2010 22:36 schrieb am "Mike Malone" : 2010/3/26 Roland Hänel > > Jonathan, > > I agree with your idea about a tool that could 'propose' good token choices for op... With the random partitioner there'

Re: Ring management and load balance

2010-03-27 Thread Roland Hänel
oland 26.03.2010 22:29 schrieb am "Rob Coli" : On 3/26/10 1:36 PM, Roland Hänel wrote: > > If I was going to write such a tool: do you think the th... The JMX interface exposes an Attribute which seems appropriate to this use. It is called "TotalDiskSpaceUsed," and is availa

Re: Ring management and load balance

2010-03-26 Thread Roland Hänel
But this 26.03.2010 22:29 schrieb am "Rob Coli" : On 3/26/10 1:36 PM, Roland Hänel wrote: > > If I was going to write such a tool: do you think the th... The JMX interface exposes an Attribute which seems appropriate to this use. It is called "TotalDiskSpaceUsed,"

Re: Ring management and load balance

2010-03-26 Thread Roland Hänel
Jonathan, I agree with your idea about a tool that could 'propose' good token choices for optimal load-balancing. If I was going to write such a tool: do you think the thrift API provides the necessary information? I think with the RandomPartitioner you cannot scan all your rows to actually find

Re: Which client API to choose?

2010-03-24 Thread Roland Hänel
now (however, also doesn't throw an exception). Please don't shoot me, I came up with this code just grep'ing the source and doing something that seemed to make a little sense... ;-) Greetings, Roland 2010/3/24 Eric Evans > On Wed, 2010-03-24 at 14:15 +0100, Roland Hä

Which client API to choose?

2010-03-24 Thread Roland Hänel
Hi, First of all, thanks all of you guys who are contributing to this amazing project. I've been looking at Cassandra for a couple of days now, and I'm still impressed by the whole thing. However, it wasn't all that straight-forward getting my first "hello world" programs to run with Cassandra. A