Re: Embedded Cassandra server startup question

2011-01-20 Thread Roshan Dawrani
Ok, got a Cassandra client from Hector and changed my clean-up to be truncate() based. Here is how I did it, if it could be any use to anyone: = HConnectionManager connectionManager = cassandraCluster.connectionManager Collection activePools = connectio

Re: Cassandra on iSCSI?

2011-01-20 Thread Mick Semb Wever
> It should work fine; the main reason to go with local storage is the > huge cost advantage. [OT] They're quoting roughly the same price for both (claiming that the extra cost goes into having for each node a separate disk cabinet to run local raid-5). > *I just committed a README for contrib/st

Re: Embedded Cassandra server startup question

2011-01-20 Thread Roshan Dawrani
Back to square one on using CliMain/CliClient vs Cassandra/Hector API for cleanuup. It seems CliClient uses Antlr 3.1+ for parsing the statements passed to it, but I am using Grails that uses Antlr 2.7.7 (used by groovy code parsing), so I can't mix the two for programmatic use. Someone please te

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread Jonathan Ellis
obsolete sstables are not the same thing as tombstones. On Thu, Jan 20, 2011 at 8:11 PM, buddhasystem wrote: > > Thanks! > > What's strange anyhow is that the GC period for these cfs expired some days > ago. I thought that a compaction would take care of these tombstones. I used > nodetool to "co

Re: Cassandra on iSCSI?

2011-01-20 Thread Jonathan Ellis
On Thu, Jan 20, 2011 at 2:13 PM, Mick Semb Wever wrote: > To go with raid-5 disks our hosting provider requires proof that iSCSI > won't work. I tried various things (eg `nodetool cleanup` on 12Gb load > giving 5k IOPS) but iSCSI seems to keep up to the performance of the > local raid-5 disks... >

Re: Embedded Cassandra server startup question

2011-01-20 Thread Roshan Dawrani
On Fri, Jan 21, 2011 at 8:56 AM, Roshan Dawrani wrote: > On Fri, Jan 21, 2011 at 8:52 AM, Maxim Potekhin wrote: > >> You can script the actions you need and pipe the file into Cassandra-CLI. >> Works for me. >> > > Probably CliMain / CliClient will help me there doing it as per your suggestion.

Re: Embedded Cassandra server startup question

2011-01-20 Thread Roshan Dawrani
On Fri, Jan 21, 2011 at 8:52 AM, Maxim Potekhin wrote: > You can script the actions you need and pipe the file into Cassandra-CLI. > Works for me. > Thanks Maxim, but first preference will be to do it through the API and not launch the Cassandra-CLI process with a scripted set of actions (I as

Re: Embedded Cassandra server startup question

2011-01-20 Thread Maxim Potekhin
You can script the actions you need and pipe the file into Cassandra-CLI. Works for me. On 1/20/2011 10:18 PM, Roshan Dawrani wrote: On Fri, Jan 21, 2011 at 8:07 AM, Aaron Morton > wrote: There is a truncate() function that will clear a CF. It may leave a

Re: Embedded Cassandra server startup question

2011-01-20 Thread Roshan Dawrani
On Fri, Jan 21, 2011 at 8:07 AM, Aaron Morton wrote: > There is a truncate() function that will clear a CF. It may leave a > snapshot around, cannot remember exactly. > Not sure if Hector (0.7.0-22) has added truncate() to its API yet. I can't find it. In Hector, I see a *dropColumnFamily()* tha

Re: Embedded Cassandra server startup question

2011-01-20 Thread Roshan Dawrani
On Fri, Jan 21, 2011 at 8:07 AM, Aaron Morton wrote: > There is a truncate() function that will clear a CF. It may leave a > snapshot around, cannot remember exactly. > > Or you could drop and recreate the keyspace between tests using > system_add_keyspace() and system_drop_keyspace(). The system

Re: Embedded Cassandra server startup question

2011-01-20 Thread Aaron Morton
There is a truncate() function that will clear a CF. It may leave a snapshot around, cannot remember exactly. Or you could drop and recreate the keyspace between tests using system_add_keyspace() and system_drop_keyspace(). The system tests in the test/system/__init__.py sort of do this. AaronOn 21

Re: Embedded Cassandra server startup question

2011-01-20 Thread Roshan Dawrani
On Fri, Jan 21, 2011 at 5:14 AM, Anand Somani wrote: > Here is what worked for me, I use testNg, and initialize and createschema > in the @BeforeClass for each test > >- In the @AfterClass, I had to drop schema, otherwise I was getting the >same exception. >- After this I started gett

Re: Embedded Cassandra server startup question

2011-01-20 Thread Roshan Dawrani
On Fri, Jan 21, 2011 at 3:02 AM, Aaron Morton wrote: > Do you have a full error stack? > > That error is raised when the schema is added to an internal static map. > There is a lot of static state so it's probably going to make your life > easier if you can avoid reusing the JVM. > > Hi Aaron, Ac

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread buddhasystem
Thanks! What's strange anyhow is that the GC period for these cfs expired some days ago. I thought that a compaction would take care of these tombstones. I used nodetool to "compact". -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-C

Re: Document Mapper for Ruby?

2011-01-20 Thread Joshua Partogi
Thanks Ryan. This is what I am looking for. Let me try it out. On Fri, Jan 21, 2011 at 4:58 AM, Ryan King wrote: > Not sure what you mean by document mapper, but CassandraObject might > fit the bill: https://github.com/nzkoz/cassandra_object > > -ryan > > On Wed, Jan 19, 2011 at 11:03 PM, Jos

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread Aaron Morton
I think the abandoned sstables resulting from dropping a CF are handled the same as SSTables left over after compaction. They are deleted as part of a full GC.See the section on Compaction here http://wiki.apache.org/cassandra/MemtableSSTableYou can trigger GC via JConsole. Hope that helps AaronOn

Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread buddhasystem
Greetings, I just used teh nodetool to force a major compaction on my cluster. It seems like the cfs currently in service were indeed compacted, while the old test materials (which I dropped from CLI) were still there as tombstones. Is that the expected behavior? Hmm... TIA. -- View this mess

Re: How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Jonathan Ellis
It's okay as of 0.6.10 and 0.7.0. But the bug only affected range queries, and you'd know if you'd hit it because there would be really obvious exception messages in your log. In other words it's probably not necessary to move your nodes. On Thu, Jan 20, 2011 at 4:30 PM, Jeremiah Jordan wrote:

Re: Embedded Cassandra server startup question

2011-01-20 Thread Anand Somani
Here is what worked for me, I use testNg, and initialize and createschema in the @BeforeClass for each test - In the @AfterClass, I had to drop schema, otherwise I was getting the same exception. - After this I started getting port conflict with the second test, so I added my own versi

Re: UnserializableColumnFamilyException: Couldn't find cfId

2011-01-20 Thread Aaron Morton
Sounds like there are multiple versions of your schema around the cluster. What client API are you using? Does it support the describe_schema_versions() function? This will tell you how many versions there are. The easy solutions here is scrub the data and start a new 0.7 cluster using the release

RE: How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Jeremiah Jordan
Is 0 really OK? I remember some bugs coming up recently with a token of 0. I was thinking about moving my 0 token servers to 1 because of that. -Jeremiah Jordan From: Eric Gilmore [mailto:e...@riptano.com] Sent: Thursday, January 20, 2011 1:55 PM To: use

Re: How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Eric Gilmore
Sorry, my comments were indeed a little short on elucidation. :) The cited doc suggest that setting initial_token to 0 on the first node "simplifies load balancing as you later expand the cluster . . . . If this is unset (the default), Cassandra picks a token number randomly." A more complete

Re: memory size and disk size prediction tool

2011-01-20 Thread Aaron Morton
Not that I know of, do you have an existing test system you can use as a baseline ? For memory have a read of the JVM Heap Size section here http://wiki.apache.org/cassandra/MemtableThresholdsYou will also want to have some memory for disk caching and the os. 8 or 12gb feels like a good start.For d

Re: Embedded Cassandra server startup question

2011-01-20 Thread Aaron Morton
Do you have a full error stack? That error is raised when the schema is added to an internal static map. There is a lot of static state so it's probably going to make your life easier if you can avoid reusing the JVM. Im guessing your errors comes from AbstractCassandraDaemon.setup() calling D

Re: Upgrading from 0.6 to 0.7.0

2011-01-20 Thread Aaron Morton
I'm not sure if your suggesting running a mixed mode cluster there, but AFAIK the changes to the internode protocol prohibit this. The nodes will probable see each either via gossip, but the way the messages define their purpose (their verb handler) has been changed. Out of interest which is mo

Re: How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Brandon Williams
On Thu, Jan 20, 2011 at 2:14 PM, Robert Coli wrote: > On Thu, Jan 20, 2011 at 11:55 AM, Eric Gilmore wrote: > > Also, in the Getting Started page, we note that it may be best to set > > initial_token to 0 on the very first node that you start. > > Could you expand a bit on the reasons for and im

Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-20 Thread Jonathan Shook
clients: Java and MVEL + Hector Perl + thrift Usage: high-traffic monitoring harness with dynamic mapping and loading of handlers Cassandra was part of the "do more with less hardware" approach to designing this system. On Fri, Jan 14, 2011 at 11:24 AM, Ertio Lew wrote: > Hey, > > If you have

Cassandra on iSCSI?

2011-01-20 Thread Mick Semb Wever
Does anyone have any experiences with Cassandra on iSCSI? I'm currently testing a (soon-to-be) production server using both local raid-5 and iSCSI disks. Our hosting provider is pushing us hard towards the iSCSI disks because it is easier for them to run (and to meet our needs for increasing disk

UnserializableColumnFamilyException: Couldn't find cfId

2011-01-20 Thread Oleg Proudnikov
Hi All, Could you please help me understand the impact on my data? I am running a 6 node 0.7-rc4 Cassandra cluster with RF=2. Schema was defined when the cluster was created and did not change. I am doing batch load with CL=ONE. The cluster is under some stress in memory and I/O. Each node has 1G

Re: How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Robert Coli
On Thu, Jan 20, 2011 at 11:55 AM, Eric Gilmore wrote: > Also, in the Getting Started page, we note that it may be best to set > initial_token to 0 on the very first node that you start. Could you expand a bit on the reasons for and implications of this, for our collective elucidation? :) =Rob

Re: How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Eric Gilmore
Patrick, if you try adding capacity again from the beginning, I'd be curious to hear if the DataStax/Riptanodocs are helpful or not. Also, in the Getting Started page, w

Re: Distributed counters

2011-01-20 Thread Kelvin Kakugawa
Hi Rustam, All of our large production clusters are still on 0.6.6. However, we have an 0.7 branch, here: https://github.com/kakugawa/cassandra/tree/twttr-cassandra-0.7-counts that is our migration target. It passes our internal distributed tests and will be in production soon. -Kelvin On Thu

Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-20 Thread Jean-Yves LEBLEU
Java + Pelops Cassandra 0.6.8

Re: How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Peter Schuller
> Is it supposed to work that way, or have I missed something ? I don't see that you did anything wrong based on your description and based on my understanding how it works in 0.7 (not sure about 0.6), but hopefully someone else can address that part. What I can think of - did you inspect the log

Cassandra Ubuntu PPA stable release updated to 0.7.0

2011-01-20 Thread Clint Byrum
For anybody using the cassandra ubuntu stable release PPA, it is being updated right now to 0.7.0. This is just a heads up. I'd expect anybody using it to still use all best practices from the cassandra documentation for upgrades, and not just blindly apt-get upgrade. But either way, this is a big

Re: Cassandra automatic startup script on ubuntu

2011-01-20 Thread Clint Byrum
On Thu, 2011-01-20 at 17:51 +0100, Sébastien Druon wrote: > Hello! > > > I am using cassandra on a ubuntu machine and installed it from the > binary found on the cassandra home page. > However, I did not find any scripts to start it up at boot time. > > > Where can I find this kind of script? >

Re: Document Mapper for Ruby?

2011-01-20 Thread Ryan King
Not sure what you mean by document mapper, but CassandraObject might fit the bill: https://github.com/nzkoz/cassandra_object -ryan On Wed, Jan 19, 2011 at 11:03 PM, Joshua Partogi wrote: > Hi all, > > Is anyone aware of a document mapper for Ruby similar to MongoMapper? > > Thanks heaps for your

RE: Under expectation response time for reads

2011-01-20 Thread George Ciubotaru
Hi Miguel, This indeed solved the problem. The response times are now under 1 ms which is great. Thank you once again, George From: Miguel Verde [mailto:miguelitov...@gmail.com] Sent: 20 January 2011 16:46 To: user@cassandra.apache.org Subject: Re: Under expectation response time for read

Re: Lost MUTATIONS on several Cassandra nodes - no impact on the client

2011-01-20 Thread Jonathan Ellis
On Thu, Jan 20, 2011 at 10:47 AM, Oleg Proudnikov wrote: > Q1. Is it possible that Cassandra will drop both replicas for a given column > during these losses? Or does it guarantee that one replica is still written? It guarantees that if the requested ConsistencyLevel is not achieved, client will

Re: Cassandra automatic startup script on ubuntu

2011-01-20 Thread Dave Viner
You can also use the apt-get repository version, which installs the startup script. On http://wiki.apache.org/cassandra/CloudConfig, see the Cassandra Basic Setup section. It applies to any debian based machine, not just cloud instances. HTH Dave Viner On Thu, Jan 20, 2011 at 9:11 AM, Donal Zan

Re: Compression in Cassandra

2011-01-20 Thread Stu Hood
Also note that an improved and compressible file format has been in the works for a while now. https://issues.apache.org/jira/browse/CASSANDRA-674 I am endlessly optimistic that it will make it into the 'next' version; in particular, the current hope is 0.8 On Jan 20, 2011 6:34 AM, "Terje Marthi

Configurability of the implementation of the Cassandra.Iface

2011-01-20 Thread indika kumara
Hi all, Would it be worth the capability of configuring the implementation of the Cassandra.Iface?. I have to intercept the requests to the Cassandra server without modifying the existing code (CassandraServer.java). So, the server-side implementation of the Cassandra.Iface (CassandraServer) need

Re: Cassandra automatic startup script on ubuntu

2011-01-20 Thread Donal Zang
On 20/01/2011 17:51, Sébastien Druon wrote: Hello! I am using cassandra on a ubuntu machine and installed it from the binary found on the cassandra home page. However, I did not find any scripts to start it up at boot time. Where can I find this kind of script? Thanks a lot in advance Sebas

Re: Use Cassandra to store 2 million records of persons

2011-01-20 Thread David G. Boney
I don't think the below statement accurately describes data mining or using Cassandra for data mining. All the techniques I am familiar with for either data mining or machine learning, which data mining is a subset, make one or more sequential scans through the data to abstract statistics or bui

Re: Multi-tenancy, and authentication and authorization

2011-01-20 Thread indika kumara
I do not have a better knowledge about the Cassandra. As per my knowledge, there is no such a tool. I believe, such a tool would be worth. Thanks, Indika On Thu, Jan 20, 2011 at 6:15 PM, Mimi Aluminium wrote: > Hi, > > I have a question that somewhat related to the above. > Is there a tool that

Lost MUTATIONS on several Cassandra nodes - no impact on the client

2011-01-20 Thread Oleg Proudnikov
Hi All, Could you please help me understand the impact of this behaviour? I am running a 6 node 0.7-rc4 Cassandra cluster with RF=2 6 Hector clients (one per node) are performing single-threaded batch load running on the same servers. CL=ONE. Client performs one simple small query and an insert

Cassandra automatic startup script on ubuntu

2011-01-20 Thread Sébastien Druon
Hello! I am using cassandra on a ubuntu machine and installed it from the binary found on the cassandra home page. However, I did not find any scripts to start it up at boot time. Where can I find this kind of script? Thanks a lot in advance Sebastien

Re: Under expectation response time for reads

2011-01-20 Thread Miguel Verde
Disable Nagle's algorithm and you should see much better performance. It must not be used on loopback. http://markmail.org/message/rgauuflglwemm24o On Thu, Jan 20, 2011 at 6:24 AM, George Ciubotaru < george.ciubot...@weedle.com> wrote: > Hello, > > We are in the process of evaluating Cassandra t

Re: Distributed counters

2011-01-20 Thread Nate McCall
On the Hector side, we will be adding this to trunk (and thus moving Hector trunk to Cassandra 0.8.x) in the next week or two. On Wed, Jan 19, 2011 at 6:12 PM, Rustam Aliyev wrote: > Hi, > > Does anyone use CASSANDRA-1072 counters patch with 0.7 stable branch? I need > this functionality but can'

memory size and disk size prediction tool

2011-01-20 Thread Mimi Aluminium
Hi, We are implementing a 'middlewear' layer to an underneath storage and need to estimate costs for various system configurations. Specifically, I want to estimate the resources (memory, disk) for our data model. Is there a tool that given certain storage configuration parameters, column family

Re: Compression in Cassandra

2011-01-20 Thread Terje Marthinussen
Perfectly normal with 3-7x increase in data size depending on you data schema. Regards, Terje On 20 Jan 2011, at 23:17, "akshatbakli...@gmail.com" wrote: > I just did a du -h DataDump which showed 40G > and du -h CassandraDataDump which showed 170G > > am i doing something wrong. > have you o

How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Patrick de Torcy
Hi, I've read many, many docs (yes , http://wiki.apache.org/cassandra/Operationstoo...) but I still can't see how bootstrapping work... I started with one node and put my data in it (16 GB). It's ok. I added a second node with AutoBootstrap=true (as explained in the doc, I didn't add this node i

Re: Use Cassandra to store 2 million records of persons

2011-01-20 Thread Surender Singh
David Please tell me any solution for it. Thanks and regards Surender Singh On Thu, Jan 20, 2011 at 6:05 PM, David Boxenhorn wrote: > Cassandra is not a good solution for data mining type problems, since it > doesn't have ad-hoc queries. Cassandra is designed to maximize throughput, > which is

Re: Compression in Cassandra

2011-01-20 Thread akshatbakli...@gmail.com
I just did a du -h DataDump which showed 40G and du -h CassandraDataDump which showed 170G am i doing something wrong. have you observed some compression in it. On Thu, Jan 20, 2011 at 6:57 PM, Javier Canillas wrote: > How do you calculate your 40g data? When you insert it into Cassandra, you >

Re: Compression in Cassandra

2011-01-20 Thread Javier Canillas
How do you calculate your 40g data? When you insert it into Cassandra, you need to convert the data into a Byte[], maybe your problem is there. On Thu, Jan 20, 2011 at 10:02 AM, akshatbakli...@gmail.com < akshatbakli...@gmail.com> wrote: > Hi all, > > I am experiencing a unique situation. I loade

Compression in Cassandra

2011-01-20 Thread akshatbakli...@gmail.com
Hi all, I am experiencing a unique situation. I loaded some data onto Cassandra. my data was about 40 GB but when loaded to Cassandra the data directory size is almost 170GB. This means the **data got inflated**. Is it the case just with me or some else is also facing the inflation or its the ge

Re: Use Cassandra to store 2 million records of persons

2011-01-20 Thread David Boxenhorn
Cassandra is not a good solution for data mining type problems, since it doesn't have ad-hoc queries. Cassandra is designed to maximize throughput, which is not usually a problem for data mining. On Thu, Jan 20, 2011 at 2:07 PM, Surender Singh wrote: > Hi All > > I want to use Apache Cassandra to

Under expectation response time for reads

2011-01-20 Thread George Ciubotaru
Hello, We are in the process of evaluating Cassandra to be used with our product; I've started with some performance tests but unfortunately I'm getting very bad results for read operations (around 200 ms per read request which is much much more than what I'm reading that Cassandra can deliver)

Re: Multi-tenancy, and authentication and authorization

2011-01-20 Thread Mimi Aluminium
Hi, I have a question that somewhat related to the above. Is there a tool that predicts the resource consumption (i.e, memory, disk, CPU) in an offline mode? Means it is given with the storage conf parameters, ks, CFs and data model, and then application parameters such read/write average rates.

Use Cassandra to store 2 million records of persons

2011-01-20 Thread Surender Singh
Hi All I want to use Apache Cassandra to store information (like first name, last name, gender, address) about 2 million people. Then need to perform analytic and reporting on that data. is need to store information about 2 million people in Mysql and then transfer that information into Cassandr

Re: Multi-tenancy, and authentication and authorization

2011-01-20 Thread David Boxenhorn
I have added my comments to this issue: https://issues.apache.org/jira/browse/CASSANDRA-2006 Good luck! On Thu, Jan 20, 2011 at 1:53 PM, indika kumara wrote: > Thanks David We decided to do it at our client-side as the initial > implementation. I will investigate the approaches for supporti

Re: Multi-tenancy, and authentication and authorization

2011-01-20 Thread indika kumara
Thanks David We decided to do it at our client-side as the initial implementation. I will investigate the approaches for supporting the fine grained control of the resources consumed by a sever, tenant, and CF. Thanks, Indika On Thu, Jan 20, 2011 at 3:20 PM, David Boxenhorn wrote: > As far

Embedded Cassandra server startup question

2011-01-20 Thread Roshan Dawrani
Hi, I am using Cassandra for a Grails application and in that I start the embedded server when the Spring application context gets built. When I run my Grails app test suite - it first runs the integration and then functional test suite and it builds the application text individually for each pha

Re: Upgrading from 0.6 to 0.7.0

2011-01-20 Thread Daniel Josefsson
In our case our replication factor is more than half the number of nodes in the cluster. Would it be possible to do the following: - Upgrade half of them - Change Thrift Port and inter-server port (is this the storage_port?) - Start them up - Upgrade clients one by one - Upgrade th

Re: Multi-tenancy, and authentication and authorization

2011-01-20 Thread David Boxenhorn
As far as I can tell, if Cassandra supports three levels of configuration (server, keyspace, column family) we can support multi-tenancy. It is trivial to give each tenant their own keyspace (e.g. just use the tenant's id as the keyspace name) and let them go wild. (Any out-of-bounds behavior on th