repair and amount of transfers

2011-06-14 Thread Terje Marthinussen
Hi, I have been testing repairs a bit in different ways on 0.8.0 and I am curious on what to really expect in terms of data transferred. I would expect my data to be fairly consistent in this case from the start. More than a billion supercolumns, but there was no errors in feed and we have seen m

Re: odd logs after repair

2011-06-14 Thread Sasha Dolgy
Hi ... Does anyone else see these type of INFO messages in their log files, or is i just me..? INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You shoul

Re: odd logs after repair

2011-06-14 Thread Sylvain Lebresne
The exception itself is a bug (I've created https://issues.apache.org/jira/browse/CASSANDRA-2767 to fix it). However, the important message is the previous one (Even if the exception was not thrown, repair wouldn't be able to work correctly, so the fact that the exception is thrown is not such a b

Re: get_indexed_slices ~ simple map-reduce

2011-06-14 Thread Michal Augustýn
Thank you! I have one more question ;-) If I use regular "get" function then I can be sure that it takes ~5ms. So I suppose that if I use "get_indexed_slices" function then the response time depends on how many rows match the most selected equality predicate. Am I right? Augi 2011/6/14 aaron mor

Re: odd logs after repair

2011-06-14 Thread Sasha Dolgy
Hi Sylvain, I verified on all nodes with nodetool version that they are 0.8 and have even restarted nodes. Still persists. The four nodes all report similar errors about the other nodes. When i upgraded to 0.8 maybe there were relics about the keyspace that say it's from an earlier version? I

Re: odd logs after repair

2011-06-14 Thread Sylvain Lebresne
Could you open a ticket then please ? -- Sylvain On Tue, Jun 14, 2011 at 10:25 AM, Sasha Dolgy wrote: > Hi Sylvain, > > I verified on all nodes with nodetool version that they are 0.8 and have > even restarted nodes.  Still persists.  The four nodes all report similar > errors about the other no

Re: odd logs after repair

2011-06-14 Thread Sasha Dolgy
https://issues.apache.org/jira/browse/CASSANDRA-2768 On Tue, Jun 14, 2011 at 10:55 AM, Sylvain Lebresne wrote: > Could you open a ticket then please ? > > -- > Sylvain > > On Tue, Jun 14, 2011 at 10:25 AM, Sasha Dolgy wrote: >> Hi Sylvain, >> >> I verified on all nodes with nodetool version that

Re: repair and amount of transfers

2011-06-14 Thread Terje Marthinussen
Ah.. I just found Cassandra-2698 (I thought I had seen something about this)... I guess that means I have too see if I can find time to investigate if I have a reproducible case? Terje On Tue, Jun 14, 2011 at 4:21 PM, Terje Marthinussen wrote: > Hi, > > I have been testing repairs a bit in di

RE: Are data migration tools for Cassandra exist?

2011-06-14 Thread Artem Orobets
Thank you for your answer. We made investigation of Cassandra architecture, and we interested in approaches for solving this problem. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, June 12, 2011 5:51 AM To: user@cassandra.apache.org Subject: Re: Are data migration tools for Cas

Re: Is this the proper use of OPP?

2011-06-14 Thread Eric tamme
I would point you to this article, it does a good job describing OPP and pretty much answers the specific questions you asked. http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ -Eric On Mon, Jun 13, 2011 at 5:06 PM, AJ wrote: > I'm just becoming

Re: repair and amount of transfers

2011-06-14 Thread Peter Schuller
> I just found Cassandra-2698 (I thought I had seen something about this)... There is also the other bug that causes repair to transfer data from all CF:s rather than just the one being repaired. This could be affecting you if you're doing repair of individual CF:s rather than everything at the sa

Re: Is this the proper use of OPP?

2011-06-14 Thread AJ
Thanks. I found that article later. I was definitely off-base with respect to OPP. Random partitioning is pretty much the way to go and datastax has a good article on geographic distribution: http://www.datastax.com/docs/0.8/operations/datacenter Sorry for the long pointless post previously

cql/secondary indexes - select in

2011-06-14 Thread Bill
I was wondering if there are plans for (or any interest in) an IN operator for CQL/Secondary Indexes? I have a use case to pull back N keys on an index and rather than perform N selects would like to do this SELECT ... WHERE KEY = keyname AND colname IN [val1,,..] Bill

Re: repair and amount of transfers

2011-06-14 Thread Jonathan Ellis
that one's done for 0.8.1: https://issues.apache.org/jira/browse/CASSANDRA-2280 On Tue, Jun 14, 2011 at 5:56 AM, Peter Schuller wrote: >> I just found Cassandra-2698 (I thought I had seen something about this)... > > There is also the other bug that causes repair to transfer data from > all CF:s

Re: cql/secondary indexes - select in

2011-06-14 Thread Jonathan Ellis
We gave this a try in https://issues.apache.org/jira/browse/CASSANDRA-2591 -- it turns out it's not a good fit for the CQL QueryProcessor. We really need to be able to push more complex queries to the index nodes (https://issues.apache.org/jira/browse/CASSANDRA-1598). So, we would still like to do

New web client & future API

2011-06-14 Thread Markus Wiesenbacher | Codefreun.de
Hi, what is the future API for Cassandra? Thrift, Avro, CQL? I just released an early version of my web client (http://www.codefreun.de/apollo) which is Thrift-based, and therefore I would like to know what the future is ... Many thanks MW

Re: New web client & future API

2011-06-14 Thread Sasha Dolgy
Your application is built with the thrift bindings and not with a higher level client like Hector? On Tue, Jun 14, 2011 at 3:42 PM, Markus Wiesenbacher | Codefreun.de wrote: > > Hi, > > what is the future API for Cassandra? Thrift, Avro, CQL? > > I just released an early version of my web client

Re: New web client & future API

2011-06-14 Thread Victor Kabdebon
Hello Markus, Actually from what I understood (please correct me if I am wrong) CQL is based on Thrift / Avro. Victor Kabdebon 2011/6/14 Markus Wiesenbacher | Codefreun.de > > Hi, > > what is the future API for Cassandra? Thrift, Avro, CQL? > > I just released an early version of my web client

Cassandra Statistics and Metrics

2011-06-14 Thread Marcos Ortiz
Regards to all. My team and me here on the University are working on a generic solution for Monitoring and Capacity Planning for Open Sources Databases, and one of the NoSQL db that we choosed to give it support is Cassandra. Where I can find all the metrics and statistics of Cassandra? I'm thi

Re: New web client & future API

2011-06-14 Thread Markus Wiesenbacher | Codefreun.de
Yes, I wanted to start from the base ... Am 14.06.2011 um 15:48 schrieb Sasha Dolgy : > Your application is built with the thrift bindings and not with a > higher level client like Hector? > > On Tue, Jun 14, 2011 at 3:42 PM, Markus Wiesenbacher | Codefreun.de > wrote: >> >> Hi, >> >> what i

Re: Cassandra Statistics and Metrics

2011-06-14 Thread Viktor Jevdokimov
We're using open source monitoring solution Zabbix from http://www.zabbix.com/ using zapcat - not only for Cassandra but for the whole system. As MX4J tools plugin is supported by Cassandra, support of zapcat in Cassandra by default is welcome - we have to use a wrapper to start zapcat agent. 201

Re: Cassandra Statistics and Metrics

2011-06-14 Thread Marcos Ortiz
Where I can find the source code? El 6/14/2011 10:13 AM, Viktor Jevdokimov escribió: We're using open source monitoring solution Zabbix from http://www.zabbix.com/ using zapcat - not only for Cassandra but for the whole system. As MX4J tools plugin is supported by Cassandra, support of zapcat

Re: Cassandra Statistics and Metrics

2011-06-14 Thread Dan Kuebrich
Here's what people usually monitor from munin (and how they get at it): https://github.com/jbellis/cassandra-munin-plugins . Sounds a lot like what these guys are doing (even the stack?): http://datadoghq.com/ On Tue, Jun 14, 2011 at 10:13 AM, Viktor Jevdokimov wrote: > We're using open source m

Re: Cassandra Statistics and Metrics

2011-06-14 Thread Marcos Ortiz
We are thinking a Web 2.0 application, so Munin was not built with these thougths in mind. I will be reviewing the datadoghq site. Regards El 6/14/2011 10:23 AM, Dan Kuebrich escribió: Here's what people usually monitor from munin (and how they get at it): https://github.com/jbellis/cassandra-m

Cassandra scaling problem in virtualized environment

2011-06-14 Thread Schuilenga, Jan Taeke
Hi All, We are having issues testing Cassandra in a virtualized environment (Vmware ESX). Our challenge is to combine a high number of concurrent users with a very low maximum response time. Immediately we ran into a problem with scalability where our performance (Trx per sec) unexpectedly deg

RE: Docs: "Why do deleted keys show up during range scans?"

2011-06-14 Thread Jeremiah Jordan
I am pretty sure how Cassandra works will make sense to you if you think of it that way, that rows do not get deleted, columns get deleted. While you can delete a row, if I understand correctly, what happens is a tombstone is created which matches every column, so in effect it is deleting the colum

RE: Docs: "Why do deleted keys show up during range scans?"

2011-06-14 Thread Jeremiah Jordan
Also, tombstone's are not "attached" anywhere. A tombstone is just a column with special value which says "I was deleted". And I am pretty sure they go into SSTables etc the exact same way regular columns do. -Original Message- From: Jeremiah Jordan [mailto:jeremiah.jor...@morningstar.co

Re: Cassandra scaling problem in virtualized environment

2011-06-14 Thread Ryan King
On Tue, Jun 14, 2011 at 8:16 AM, Schuilenga, Jan Taeke wrote: > Hi All, > > We are having issues testing Cassandra in a virtualized environment (Vmware > ESX). > Our challenge is to combine a  high number of concurrent users with a very > low maximum response time. > Immediately we ran into a prob

Re: Migration question

2011-06-14 Thread Eric Czech
Thanks Aaron. I'll make sure to copy the system tables. Another thing -- do you have any suggestions on raid configurations for main data drives? We're looking at RAID5 and 10 and I can't seem to find a convincing argument one way or the other. Thanks again for your help. On Mon, Jun 6, 2011 a

Re: one way to make counter delete work better

2011-06-14 Thread Sylvain Lebresne
Who assigns those epoch numbers ? You need all nodes to agree on the epoch number somehow to have this work, but then how do you maintain those in a partition tolerant distributed system ? I may have missed some parts of your proposal but let me consider a scenario that we have to be able to handl

Re: possible 'coming back to life' bug with counters

2011-06-14 Thread Sylvain Lebresne
As listed here: http://wiki.apache.org/cassandra/Counters, counter deletion is provided as a convenience for permanent deletion of counters but, because of the design of counters, it is never safe to issue an increment on a counter that has been deleted (that is, you will experience back to life be

Re: one way to make counter delete work better

2011-06-14 Thread Milind Parikh
If I understand this correctly, then the epoch integer would be generated by each node. Since time always flows forward, the assumption would be, I suppose, that the epochs would be tagged with the node that generated them and additionally the counter would carry as much history as necessary (and p

bring out your rpms...

2011-06-14 Thread Colin
Does anyone know where an rpm for 0.7.6-2 might be? (rhel) I checked the datastax site and only see up to 0.7.6-1

Where is my data?

2011-06-14 Thread AJ
Is there an official deterministic formula to compute the various subsets of a given cluster that comprises a complete set of data (redundant rows ok)? IOW, if multiple nodes become unavailable one at a time, at what point can I say <100% of my data is available? Obviously, the method would h

Re: get_indexed_slices ~ simple map-reduce

2011-06-14 Thread aaron morton
yes, just like a SELECT in SQL. With a better index match there is less data read off disk, less filter loops, and a faster the query. btw, the read path in cassandra is generally non deterministic. It varies with respect to how many mutations the key has received over time, and how efficient t

Re: bring out your rpms...

2011-06-14 Thread Nate McCall
The 0.7.6-2 release was made over *-1 specifically to correct an issue with debian packaging. This keeps coming up though, so I'll probably just go ahead and roll a 0.7.6-2 for rpm.datastax.com so as not to confuse folks. On Tue, Jun 14, 2011 at 4:19 PM, Colin wrote: > Does anyone know where an

Re: bring out your rpms...

2011-06-14 Thread Konstantin Naryshkin
You could try to roll your own. I managed to create a custom 0.8 RPM using the spec file from the redhat directory. First check out the source. Then edit the spec file with the following changes: Set the Version and Release variables appropriately. At the end of %install, add the following 2 li

Re: New web client & future API

2011-06-14 Thread aaron morton
AFAIK... Avro is dead. Thrift is the current API and currently the only full featured API. CQL is a possible future API, given community support and development time it may become the only API. The initial release is not feature complete (e.g. missing some DDL statements) and still uses thr

RE: bring out your rpms...

2011-06-14 Thread Colin
Thanks Nate. I appreciate it. -Original Message- From: Nate McCall [mailto:n...@datastax.com] Sent: Tuesday, June 14, 2011 4:52 PM To: user@cassandra.apache.org Subject: Re: bring out your rpms... The 0.7.6-2 release was made over *-1 specifically to correct an issue with debian packagi

Re: Docs: "Why do deleted keys show up during range scans?"

2011-06-14 Thread aaron morton
> While you can delete a row, if I understand correctly, what happens is a > tombstone is created which matches every column, so in effect it is > deleting the columns, not the whole row. A tombstone is created at the level of the delete, rather than for every column. Otherwise imagine deleting

Docs: Token Selection

2011-06-14 Thread AJ
This http://wiki.apache.org/cassandra/Operations#Token_selection says: "With NetworkTopologyStrategy, you should calculate the tokens the nodes in each DC independantly." and gives the example: DC1 node 1 = 0 node 2 = 85070591730234615865843651857942052864 DC2 node 3 = 1 node 4 = 8507059173

Re: Docs: "Why do deleted keys show up during range scans?"

2011-06-14 Thread AJ
Thanks, but right now I'm thinking, RTFC ;o) On 6/14/2011 4:37 PM, aaron morton wrote: While you can delete a row, if I understand correctly, what happens is a tombstone is created which matches every column, so in effect it is deleting the columns, not the whole row. A tombstone is created at

Re: Docs: Token Selection

2011-06-14 Thread Vijay
Yes... Thats right... If you are trying to say the below... DC1 Node1 Owns 50% (Ranges 8..4 -> 8..5 & 8..5 -> 0) Node2 Owns 50% (Ranges 0 -> 1 & 1 -> 8..4) DC2 Node1 Owns 50% (Ranges 8..5 -> 0 & 0 -> 1) Node2 Owns 50% (Ranges 1 -> 8..4 & 8..4 -> 8..5) Regards, On Tue, Jun 14, 2011 a

When does it make sense to use TimeUUID?

2011-06-14 Thread Sameer Farooqui
I would like to store some timestamped user info in a Column Family with the usernames as the row key and different timestamps as column names. Each user might have a thousand timestamped data. I understand that the ver 1 UUIDs that Cassandra combines the MAC address of the computer generating the

RE: When does it make sense to use TimeUUID?

2011-06-14 Thread Kevin
TimeUUIDs should be used for data that is time-based and requires uniqueness. TimeUUID comparisons compare the time-based portion of the UUID. So no, you do not need to know the MAC addresses. In fact, for languages that cannot get to that low of a level to access a MAC address (like Java), the

RE: When does it make sense to use TimeUUID?

2011-06-14 Thread Kevin
Correction. TimeUUID comparisons FIRST compare the time-based portion, then go on to the other portion. From: Sameer Farooqui [mailto:cassandral...@gmail.com] Sent: Tuesday, June 14, 2011 8:16 PM To: user@cassandra.apache.org Subject: When does it make sense to use TimeUUID? I would like

Re: When does it make sense to use TimeUUID?

2011-06-14 Thread Sameer Farooqui
Cool, thanks for the Clarification, Kevin. On Tue, Jun 14, 2011 at 5:43 PM, Kevin wrote: > Correction. TimeUUID comparisons FIRST compare the time-based portion, > then go on to the other portion. > > > On Tue, Jun 14, 2011 at 5:41 PM, Kevin wrote: > TimeUUIDs should be used for data that i

Multi data center configuration - A question on read correction

2011-06-14 Thread Selva Kumar
I have setup a multiple data center configuration in Cassandra. My primary intention is to minimize the network traffic between DC1 and DC2. Want DC1 read requests be served with out reaching DC2 nodes. After going through documentation, i felt following setup would do. Replica Placement Str

Re: Multi data center configuration - A question on read correction

2011-06-14 Thread Jonathan Ellis
That's just read repair sending MD5s of the data for comparison. So net traffic is light. You can turn off RR but the downsides can be large. Turning it down to say 10% can be reasonable tho. But again, if network traffic is your concern you should be fine. On Tue, Jun 14, 2011 at 8:44 PM, Sel

Re: one way to make counter delete work better

2011-06-14 Thread Yang
I almost got the code done, should release in a bit. your scenario is not a problem concerned with implementation, but really with definition of "same time". remember that in a distributed system, there is no absolute physical time concept, time is just another way of saying "before or after". i

Re: one way to make counter delete work better

2011-06-14 Thread Yang
in "stronger reason", I mean the +3 is already merged up in memtable of node B, you can't find +1 and +2 any more On Tue, Jun 14, 2011 at 7:02 PM, Yang wrote: > I almost got the code done, should release in a bit. > > > > your scenario is not a problem concerned with implementation, but really

Re: one way to make counter delete work better

2011-06-14 Thread Yang
yes epoch is generated by each node, in the replica set, upon a delete operation. epoch is **global** to the replica set, for one counter, in contrast to clock, with is local to partition. different counters have different epoch numbers , because different counters can be seen as completely diffe

Re: Docs: Token Selection

2011-06-14 Thread AJ
Yes, which means that the ranges overlap each other. Is this just a convention, or is it technically required when using NetworkTopologyStrategy? Would it be acceptable to split the ranges into quarters by ignoring the data centers, such as: DC1 node 1 = 0 Range: (12, 16], (0, 0] node 2

Re: Migration question

2011-06-14 Thread Marcos Ortiz
El 6/14/2011 1:43 PM, Eric Czech escribió: Thanks Aaron. I'll make sure to copy the system tables. Another thing -- do you have any suggestions on raid configurations for main data drives? We're looking at RAID5 and 10 and I can't seem to find a convincing argument one way or the other. We

AW: New web client & future API

2011-06-14 Thread MW | Codefreun.de
Ok, many thanks. I can remember a post (I think it was Jonathan) where they wanted to get away from Thrift because of the weak development. Markus ;) -Ursprüngliche Nachricht- Von: aaron morton [mailto:aa...@thelastpickle.com] Gesendet: Mittwoch, 15. Juni 2011 00:05 An: user@cassandra

Re: Cassandra Statistics and Metrics

2011-06-14 Thread Viktor Jevdokimov
http://www.kjkoster.org/zapcat/Zapcat_JMX_Zabbix_Bridge.html 2011/6/14 Marcos Ortiz > Where I can find the source code? > > El 6/14/2011 10:13 AM, Viktor Jevdokimov escribió: > > We're using open source monitoring solution Zabbix from > http://www.zabbix.com/ using zapcat - not only for Cassand

Re: possible 'coming back to life' bug with counters

2011-06-14 Thread Viktor Jevdokimov
What if it is OK for our case and we need counters with TTL? For us Counters and TTL both are important. After column is expired it is not important what value counter will have. Scanning millions rows just to delete expired ones is not a solution. 2011/6/14 Sylvain Lebresne > As listed here: ht

Re: one way to make counter delete work better

2011-06-14 Thread Yang
patch in https://issues.apache.org/jira/browse/CASSANDRA-2774 some coding is messy and only intended for demonstration only, we could refine it after we agree this is a feasible way to go. Thanks Yang On Tue, Jun 14, 2011 at 11:21 AM, Sylvai