Re: Moving experiences ?

2011-11-09 Thread Radim Kolar
Dne 10.11.2011 8:16, Maki Watanabe napsal(a): I missed the news. How the "nodetool move" work in recent version (0.8.x or later?) Just stream appropriate range of data between nodes? yes

Re: Moving experiences ?

2011-11-09 Thread Maki Watanabe
I missed the news. How the "nodetool move" work in recent version (0.8.x or later?) Just stream appropriate range of data between nodes? 2011/11/10 Peter Schuller : > Keep in mind that if you're using an older version of Cassandra a move > is actually a decommission followed by bootstrap - so neig

Re: propertyfilesnitch problem

2011-11-09 Thread Peter Schuller
> 2. With the same setup, after each period as defined by > dynamic_snitch_reset_interval_in_ms, the LOCAL_QUORUM performance greatly > degrades before drastically improving again within a minute. This part sounds to me like one or more nodes in the cluster are either broken and not responding a

Re: Moving experiences ?

2011-11-09 Thread Peter Schuller
> I am going to need to move some nodes to rebalance my cluster. How safe is > this to do on a cluster with writes & reads ? Moving nodes while serving live traffic is supported. Of course, there is always some inherent risk in making ring topology changes (make sure you know the process), but it

Re: Moving experiences ?

2011-11-09 Thread Rodrigo K. Ferreira
I did some test about that in the past months, and its safe if you have a high replication factor on the cluster and high read consistency on the clients. But, if you have a big amount of data, it will get much time to rebalance the nodes. On Wed, Nov 9, 2011 at 9:07 PM, Philippe wrote: > Hello,

RE: propertyfilesnitch problem

2011-11-09 Thread Shu Zhang
Hi, sorry to ask again, but I'm having trouble getting to the bottom of this... Does anyone else see this? When dynamic snitch is turned off, the performance of LOCAL_QUORUM operations is as bad as QUORUM. The property file snitch appears to be properly configured. Any suggestions on how I can i

Data model for counting uniques online

2011-11-09 Thread Philippe
Hello, I'd like to get some ideas on how to model counting uniques with cassandra. My use-case is that I have various counters that I increment based on data received from multiple devices. I'd like to be able to know if at least X unique devices contributed to a counter value. I've thought of the

Moving experiences ?

2011-11-09 Thread Philippe
Hello, I am going to need to move some nodes to rebalance my cluster. How safe is this to do on a cluster with writes & reads ? Thanks

AW: Running Cassandra 1.0 as Windows Service

2011-11-09 Thread Markus Wiesenbacher | Codefreun.de
If you run it as a window service it runs with system-rights (instead of user-rights of logged in user). IMHO this could help: - Right click on service (administrative tools->services) and apply the user rights which are wanted -> Maybe it helps if you allow exchange/interaction with des

Re: Will writes with < ALL consistency eventually propagate?

2011-11-09 Thread Peter Schuller
> handful of nodes that I write to with a CL of QUORUM (or there abouts). If your goal is to service reads w/o waiting for remote servers, you probably would want to use LOCAL_QUORUM (quorum within a data center) or ONE for reads. That however assumes an RF of >= 3 in each data center (which means

Re: High GC activity and OOMs

2011-11-09 Thread Peter Schuller
(You might be helped by http://wiki.apache.org/cassandra/LargeDataSetConsiderations btw - it's not entirely up to date by now... I will re-try remembering to update it.) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: High GC activity and OOMs

2011-11-09 Thread Peter Schuller
Ah, you have two CF:s. And my mistake was that I accidentally treated bits as bytes ;) My calc is that the bloom filter sizes per node for you should be about 1.8-1.9 GB. If you haven't touched heap size, IIRC the default is still going to be 2GB for your 4 GB machine (not sure, please confirm if

Re: High GC activity and OOMs

2011-11-09 Thread Peter Schuller
>                Compacted row maximum size: 36904729268 So 36 gigs. As long as you're sure each column is only about 1k, the total row size should not be a problem. > While I don't see OOMs when I use only a single thread to page the row, there > are lots of ParNew collections that take about 5

Running Cassandra 1.0 as Windows Service

2011-11-09 Thread DLC
I am trying to install and run Cassandra 1.0 as a Windows Service on Windows Server 2003 R2 x64. The installation seems to go OK, but when I try to start the service I get the error "Windows could not start the cassandra on Local Computer. For more information, review the System Event Log. If

Re: : Cassandra reads under write-only load, read degradation after massive writes

2011-11-09 Thread Jeremiah Jordan
Indexed columns cause read before write so that the index can be updated if the column already exists. On 11/09/2011 02:46 PM, Oleg Tsernetsov wrote: When monitoring JMX metrics of cassandra 0.8.7 loaded by write-only test I observe significant read activity on column family where I write to.

: Cassandra reads under write-only load, read degradation after massive writes

2011-11-09 Thread Oleg Tsernetsov
When monitoring JMX metrics of cassandra 0.8.7 loaded by write-only test I observe significant read activity on column family where I write to. It seems strange to me, but I expected no read activity on write-only load. The read activity is caused by writes, as when I stop the write test, reads act

Re: R on Cassandra

2011-11-09 Thread Paul Brown
Hi, Brian -- A little late to reply, but I'm slowly catching up. You're going to be better off, IMHO, to pull the data out of Cassandra with a tool like Pig (probably with a bit of aggregation and filtering) and then operate on it in R as a static delimited file. If you need additional autom

Re: Physical data layout of columns in super column family

2011-11-09 Thread Denis Gabaydulin
Thanks for the explanation, Konstantin. I'm a novice in the Cassandra and not so familiar with the terminology. You understood the topology well. I had a quick look at the Cassandra source code and found that my query from Hector is translated to a list of read commands(inside CassandraServer). Si

Re: security

2011-11-09 Thread Guy Incognito
ok, thx for the input! On 09/11/2011 15:19, Mohit Anchlia wrote: We lockdown ssh to root from any network. We also provide individual logins including sysadmin and they go through LDAP authentication. Anyone who does sudo su as root gets logged and alerted via trapsend. We use firewalls and also

Re: Second Cassandra users survey

2011-11-09 Thread Vijay
My wish list: 1) Conditional updates: if a column has a value then put column in the column family atomically else fail. 2) getAndSet: on counters: a separate API 3) Revert the count when client disconnects or receives a exception (so they can safely retry). 4) Something like a freeze API for upda

Row Groups

2011-11-09 Thread Todd Burruss
[Changed subject to leave survey behind] Cool, thx. Holding my breath for "row groups" :O I see it is targeted for 1.1, is this realistic? From: Jake Luciani mailto:jak...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Dat

Re: Second Cassandra users survey

2011-11-09 Thread Jake Luciani
Solandra does this https://github.com/tjake/Solandra/blob/solandra/src/lucandra/dht/RandomPartitioner.java But Row Groups is going to be the "official" way. -Jake On Wed, Nov 9, 2011 at 5:53 PM, Todd Burruss wrote: > Thx jake for the JIRA, but there was someone at the conference that had > alr

Re: Physical data layout of columns in super column family

2011-11-09 Thread Konstantin Naryshkin
I assume that Reports is the Super column family, the first 1: is the report id and in the topology is the row key, that the second 1: is the report line and in the Cassandra topology the super column, and that "value 1" is the column name. If this is not the case, maybe explain the topology better

Re: Second Cassandra users survey

2011-11-09 Thread Todd Burruss
Thx jake for the JIRA, but there was someone at the conference that had already implemented what I mentioned. It didn't offer any atomicity, just co-locating a family of data on the same node. From: Jake Luciani mailto:jak...@gmail.com>> Reply-To: "user@cassandra.apache.org

Re: Second Cassandra users survey

2011-11-09 Thread Aaron Turner
I think this was already asked for, but you can add my vote for TTL support for Counters. On Tue, Nov 1, 2011 at 3:59 PM, Jonathan Ellis wrote: > Hi all, > > Two years ago I asked for Cassandra use cases and feature requests. > [1]  The results [2] have been extremely useful in setting and > prio

Re: Off-heap caching through ByteBuffer.allocateDirect when JNA not available ?

2011-11-09 Thread Jonathan Ellis
allocateDirect is broken for this purpose, but we removed the JNA dependency using sun.misc.Unsafe instead: https://issues.apache.org/jira/browse/CASSANDRA-3271 On Wed, Nov 9, 2011 at 5:54 AM, Benoit Perroud wrote: > Hi, > > I wonder if you have already discussed about ByteBuffer.allocateDirect >

Re: security

2011-11-09 Thread Mohit Anchlia
We lockdown ssh to root from any network. We also provide individual logins including sysadmin and they go through LDAP authentication. Anyone who does sudo su as root gets logged and alerted via trapsend. We use firewalls and also have a separate vlan for datastore servers. We then open only speci

Re: security

2011-11-09 Thread Sasha Dolgy
Firewall with appropriate rules. > On Tue, Nov 8, 2011 at 6:30 PM, Guy Incognito wrote: >> >> hi, >> >> is there a standard approach to securing cassandra eg within a corporate >> network?  at the moment in our dev environment, anybody with network >> connectivity to the cluster can connect to it

Re: security

2011-11-09 Thread Brian O'Neill
Not sure this is the "standard approach", probably more "what we came up with". ;) We plan to deploy Cassandra behind a firewall denying all traffic on all ports other than 8080. Access from applications will be limited to the REST/HTTP layer, which we'll lock down with standard HTTP authenticati

Re: decommissioned node still in "LoadMap" in JMX Management Console

2011-11-09 Thread Brandon Williams
On Wed, Nov 9, 2011 at 1:28 AM, Patrik Modesto wrote: > Hi, > > on our production cluster of 8 nodes which is running cassandra 0.8.7 > we still see in the MBean > "org.apache.cassandra.db:type=StorageService.LoadMap" in JMX > Management console the 9th node we added for testing for a short time.

Off-heap caching through ByteBuffer.allocateDirect when JNA not available ?

2011-11-09 Thread Benoit Perroud
Hi, I wonder if you have already discussed about ByteBuffer.allocateDirect alternative to JNA memory allocation ? If so, do someone mind send me a pointer ? Thanks ! Benoit.

Re: Second Cassandra users survey

2011-11-09 Thread Jake Luciani
Hi Todd, Entity Groups : https://issues.apache.org/jira/browse/CASSANDRA-1684 -Jake On Wed, Nov 9, 2011 at 6:44 AM, Todd Burruss wrote: > I believe I heard someone talk at Cassandra SF conference about creating a > partitioner that was a derivation of RandomPartitioner. It essentially > would

Re: max_compaction_threshold

2011-11-09 Thread Radim Kolar
i consulted with hadoop expert and he told me that he is using value 100 for merging segments. I will rerun tests with 100 to check.

max_compaction_threshold

2011-11-09 Thread Radim Kolar
I found in stress tests that default setting this to 32 is way too high. Hadoop guys are using value 10 during merge sorts to not stress IO that much. I also discovered that filesystems like ZFS are using default io queue size of 10 per drive. I tried run tests with 10, 15 and 32 and there is

Re: shutdown by KILL

2011-11-09 Thread Radim Kolar
that's why disabling gossip + flush is better than drain. we should probably remove it. drain could be good if there is way to undrain node - to switch it back into r/w. Implement nodetool shutdown which will work like we are trying. First stop gossip then wait for other nodes to see it dow

Physical data layout of columns in super column family

2011-11-09 Thread Denis Gabaydulin
Hi, first of all, let me say thank you for the the amazing product :-) So, I have a couple of questions about internal physical data layout. Suppose, I have the following data schema: Reports:{ 1:{ 1:{"value1":"some val", "value2":"some val"}, 2:{"value1":"some val", "value2":

High GC activity and OOMs

2011-11-09 Thread Günter Ladwig
Hi, I have a 15-node cluster where each node has 4GB RAM and 80GB disk. There are three CFs, of which only two contain data. In total, each CF contains about 2 billion columns. I have a replication factor of 2. All CFs are compressed with SnappyCompressor. This is on Cassandra 1.0.2. I was run