Re: 1000's of column families

2012-09-27 Thread Robin Verlangen
"so if you add up all the applications which would be huge and then all the tables which is large, it just keeps growing. It is a very nice concept(all data in one location), though we will see how implementing it goes." This shouldn't be a real problem for Cassandra. Just add more nodes and ever

Re: downgrade from 1.1.4 to 1.0.X

2012-09-27 Thread Rob Coli
On Thu, Sep 27, 2012 at 2:46 AM, Віталій Тимчишин wrote: > I suppose the way is to convert all SST to json, then install previous > version, convert back and load Only files flushed in the new version will need to be dumped/reloaded. Files which have not been scrubbed/upgraded (ie, have the 1.0

Re: 1.1.5 Missing Insert! Strange Problem

2012-09-27 Thread Rob Coli
On Thu, Sep 27, 2012 at 3:25 PM, Arya Goudarzi wrote: > rcoli helped me investigate this issue. The mystery was that the segment of > commit log was probably not fsynced to disk since the setting was set to > periodic with 10 second delay and CRC32 checksum validation failed skipping > the reply,

Re: 1.1.5 Missing Insert! Strange Problem

2012-09-27 Thread Arya Goudarzi
rcoli helped me investigate this issue. The mystery was that the segment of commit log was probably not fsynced to disk since the setting was set to periodic with 10 second delay and CRC32 checksum validation failed skipping the reply, so what happened in my scenario can be explained by this. I am

Re: 1.1.5 Missing Insert! Strange Problem

2012-09-27 Thread Arya Goudarzi
I was restarting Cassandra nodes again today. 1 hour later my support team let me know that a customer has reported some missing data. I suppose this is the same issue. The application logs show that our client got success from the Thrift log and proceeded with responding to the user and I could gr

Re: Once again, super columns or composites?

2012-09-27 Thread Edward Kibardin
Oh... Sylvain, thanks a lot for such a complete answer. Yeah, I understand my mistake in suggestions regarding composites. It seems, composites are pretty much an advanced version of key manual joining into a string column name: : Thanks a lot! Ed On Thu, Sep 27, 2012 at 2:02 PM, Sylvain Lebresn

Re: 1.1.5 Missing Insert! Strange Problem

2012-09-27 Thread Arya Goudarzi
Thanks for your reply. I did grep on the commit logs for the offending key and grep showed Binary file matches. I am trying to use this tool to extract the commitlog and actually confirm if the mutation was a write: https://github.com/carloscm/cassandra-commitlog-extract.git On Thu, Sep 27, 2012

Re: Why data tripled in size after repair?

2012-09-27 Thread Andrey Ilinykh
On Wed, Sep 26, 2012 at 12:36 PM, Peter Schuller wrote: >> What is strange every time I run repair data takes almost 3 times more >> - 270G, then I run compaction and get 100G back. > > https://issues.apache.org/jira/browse/CASSANDRA-2699 outlines the > maion issues with repair. In short - in your

Re: pig and widerows

2012-09-27 Thread William Oberman
I don't want to switch my cassandra to HEAD, but looking at the newest code for CassandraStorage, I'm concerned the Uri parsing for widerows isn't going to work. setLocation first calls setLocationFromUri (which sets widerows to the Uri value), but then sets widerows to a static value (which is de

Re: pig and widerows

2012-09-27 Thread William Oberman
The next painful lesson for me was figuring out how to get logging working for a distributed hadoop process. In my test environment, I have a single node that runs name/secondaryname/data/job trackers (call it "central"), and I have two cassandra nodes running tasktrackers. But, I also have cass

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
Unfortunately, the security aspect is very strict. Some make their data public but there are many projects where due to client contracts, they cannot make their data public within our company(ie. Other groups in our company are not allowed to see the data). Also, currently, we have researchers up

Re: 1000's of column families

2012-09-27 Thread Aaron Turner
On Thu, Sep 27, 2012 at 7:35 PM, Marcelo Elias Del Valle wrote: > > > 2012/9/27 Aaron Turner >> >> How strict are your security requirements? If it wasn't for that, >> you'd be much better off storing data on a per-statistic basis then >> per-device. Hell, you could store everything in a single

Re: Why data tripled in size after repair?

2012-09-27 Thread Sylvain Lebresne
> I see. It explains why I get 85G + 85G instead of 90G. But after next > repair I have six extra files 75G each, > how is it possible? Maybe you've run repair on other nodes? Basically repair is a fairly blind process. If it consider that a given range (and by range I mean here the ones that repa

Re: 1000's of column families

2012-09-27 Thread Edward Capriolo
Hector also offers support for 'Virtual Keyspaces' which you might want to look at. On Thu, Sep 27, 2012 at 1:10 PM, Aaron Turner wrote: > On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean wrote: >> We have 1000's of different building devices and we stream data from these >> devices. The format

Re:

2012-09-27 Thread Vivek Mishra
Yes. On Thu, Sep 27, 2012 at 10:25 PM, Marcelo Elias Del Valle < mvall...@gmail.com> wrote: > > > 2012/9/27 Vivek Mishra > >> So it means going by secondary index way, > > > Out of curiosity, how would you index it in this case? 1 row key for each > combination, with no fields in the combination

Re: pig and widerows

2012-09-27 Thread William Oberman
Ok, this is painful. The first problem I found is in stock 1.1.5 there is no way to set widerows to true! The new widerows URI parsing is NOT in 1.1.5. And for extra fun, getting the value from the system property is BROKEN (at least in my centos linux environment). Here are the key lines of co

Re: Why data tripled in size after repair?

2012-09-27 Thread Andrey Ilinykh
On Thu, Sep 27, 2012 at 9:52 AM, Sylvain Lebresne wrote: >> I don't understand why it copied data twice. In worst case scenario it >> should copy everything (~90G) > > Sadly no, repair is currently peer-to-peer based (there is a ticket to > fix it: https://issues.apache.org/jira/browse/CASSANDRA-3

Re: 1000's of column families

2012-09-27 Thread Aaron Turner
On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean wrote: > We have 1000's of different building devices and we stream data from these > devices. The format and data from each one varies so one device has > temperature at timeX with some other variables, another device has CO2 > percentage and othe

Re: Why data tripled in size after repair?

2012-09-27 Thread Sylvain Lebresne
> I don't understand why it copied data twice. In worst case scenario it > should copy everything (~90G) Sadly no, repair is currently peer-to-peer based (there is a ticket to fix it: https://issues.apache.org/jira/browse/CASSANDRA-3200, but that's not trivial). This mean that you can end up with

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
PlayOrm DOES support inheritance mapping but only supports single table right now. In fact, DboColumnMeta.java has 4 subclasses that all map to that one ColumnFamily so we already support and heavily use the inheritance feature. That said, I am more concerned with scalability. The more you st

Re:

2012-09-27 Thread Vivek Mishra
So it means going by secondary index way, still you can hold unique combination key per row. If any of these keys are not present then it will not be part of that combination key. and everytime you will get a unique value for each row. That can definitly avoid duplicate rows. Or even you can make

Re: 1000's of column families

2012-09-27 Thread Marcelo Elias Del Valle
Dean, I was used, in the relational world, to use hibernate and O/R mapping. There were times when I used 3 classes (2 inheriting from 1 another) and mapped all of the to 1 table. The common part was in the super class and each sub class had it's own columns. The table, however, use to have a

Re:

2012-09-27 Thread Andre Tavares
user_cook_id, user_facebook_id, user_cell_phone, user_personal_id : Combination key of all will be unique? Or all of them are unique individually.? Combination key of all will be unique? no ... Or all of them are unique individually.? yes ... all them are unique individually 2012/9/27 Vivek

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
We have 1000's of different building devices and we stream data from these devices. The format and data from each one varies so one device has temperature at timeX with some other variables, another device has CO2 percentage and other variables. Every device is unique and streams it's own dat

Re:

2012-09-27 Thread Vivek Mishra
1 question. user_cook_id, user_facebook_id, user_cell_phone, user_personal_id : Combination key of all will be unique? Or all of them are unique individually.? If a combination can be unique then a having extra column(index enabled) per row should work for you. -Vivek On Thu, Sep 27, 2012 at

Re: 1000's of column families

2012-09-27 Thread Marcelo Elias Del Valle
Out of curiosity, is it really necessary to have that amount of CFs? I am probably still used to relational databases, where you would use a new table just in case you need to store different kinds of data. As Cassandra stores anything in each CF, it might probably make sense to have a lot of CFs t

Re: Once again, super columns or composites?

2012-09-27 Thread Sylvain Lebresne
> But from my understanding, you just can't update composite column, only > delete and insert... so this may make my update use case much more > complicated. Let me try to sum things up. In regular column families, a column (value) is defined by 2 keys: the row key and the column name. In super co

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
Is there a non rhetorical question in there? Maybe is that a feature request in disguise? The question was basically, Is Cassandra ok with as many CF's as you want? It sounds like it is not based on the email that every CF causes a bit more RAM to be used though. So if cassandra is not ok with

Re: Once again, super columns or composites?

2012-09-27 Thread Hiller, Dean
Can you describe your use-case in detail as it might be easier to explain a model with composite names. Later, Dean From: Edward Kibardin mailto:infa...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Thursday, Septembe

Re: Once again, super columns or composites?

2012-09-27 Thread Edward Kibardin
Sylvain, thanks for the response! I have a use case which involves update of 1.5 millions of values a day. Currently I'm just creating a new SSTable using SSTableWriter and uploading these SuperColunms to Cassandra. But from my understanding, you just can't update composite column, only delete and

Re: cassandra key cache question

2012-09-27 Thread Tamar Fraenkel
Hi! One more question: I have couple of dropped column families, and in the JMX console I don't see them under org.apache.cassandra.db.ColumnFamilies, *BUT *I do see them under org.apache.cassandra.db.Caches, and the cache is not empty! Does it mean that Cassandra still keep memory busy doing cachi

Re: downgrade from 1.1.4 to 1.0.X

2012-09-27 Thread Віталій Тимчишин
I suppose the way is to convert all SST to json, then install previous version, convert back and load 2012/9/24 Arend-Jan Wijtzes > On Thu, Sep 20, 2012 at 10:13:49AM +1200, aaron morton wrote: > > No. > > They use different minor file versions which are not backwards > compatible. > > Thanks Aa

cassandra key cache question

2012-09-27 Thread Tamar Fraenkel
Hi! Is it possible that in JMX and cfstats the Key cache size is much bigger than the number of keys in the CF? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 <>

Re: is node tool row count always way off?

2012-09-27 Thread Sylvain Lebresne
> The node tool cfstats, what is the row count estimate usually off by(what > percentage? Or what absolute number?) It will likely not be very good but is supposed to give some order of magnitude. That being said there is at least the following sources of inaccuracies: - It counts deleted rows t

Re: 1000's of column families

2012-09-27 Thread Robin Verlangen
Every CF adds some overhead (in memory) to each node. This is something you should really keep in mind. Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments

RE: Data Modeling: Comments with Voting

2012-09-27 Thread Roshni Rajagopal
Hi Drew, I think you have 4 requirements. Here are my suggestions. a) store comments : have a static column family for comments with master data like created date, created by , length etcb) when a person votes for a comment, increment a vote counter : have a counter column family for incrementin

Re: 1.1.5 Missing Insert! Strange Problem

2012-09-27 Thread Sylvain Lebresne
> I can verify the existence of the key that was inserted in Commitlogs of both > replicas however it seams that this record was never inserted. Out of curiosity, how can you verify that? -- Sylvain

Re: compression

2012-09-27 Thread Tamar Fraenkel
Hi! First, the problem is still there, altough I checked and all node agree on the schema. This is from ls -l Good Node -rw-r--r-- 1 cassandra cassandra606 2012-09-27 08:01 tk_usus_user-hc-269-CompressionInfo.db -rw-r--r-- 1 cassandra cassandra2246431 2012-09-27 08:01 tk_usus_user-hc-26

Re: 1000's of column families

2012-09-27 Thread Sylvain Lebresne
On Thu, Sep 27, 2012 at 12:13 AM, Hiller, Dean wrote: > We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When > using the tools they are all geared to analyzing ONE column family at a time > :(. If I remember correctly, Cassandra supports as many CF's as you want, > corr

Re: Once again, super columns or composites?

2012-09-27 Thread Sylvain Lebresne
When people suggest composites instead of super columns, they mean composite column 'names', not composite column 'values'. None of the advantages you cite stand in the case of composite column 'names'. -- Sylvain On Wed, Sep 26, 2012 at 11:52 PM, Edward Kibardin wrote: > Hi Community, > > I kno