Re: Performance deterioration while building secondary index

2011-09-16 Thread buddhasystem
Well, the problem is still there, i.e. I tried to add one more index and the 3-node cluster is just going spastic, becomes unresponsive etc. These boxes have plenty of CPU and memory. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Performance-de

Re: Node added, no performance boost -- are the tokens correct?

2011-04-01 Thread buddhasystem
o pages have been pretty > well > vetted over the past months :) > > > > On Thu, Mar 31, 2011 at 3:06 PM, buddhasystem <potek...@bnl.gov> > wrote: > >> I just configured a cluster of two nodes -- do these token values make >> sense? >> The reason I

Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
Yup, I screwed up the token setting, my bad. Now, I moved the tokens. I still observe that read latency deteriorated with 3 machines vs original one. Replication factor is 1, Cassandra version 0.7.2 (didn't have time to upgrade as I need results by this weekend). Key and row caching was disabled

Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
I just configured a cluster of two nodes -- do these token values make sense? The reason I'm asking that so far I don't see load balancing to be happening, judging from performance. Address Status State LoadOwnsToken

Netstats out of sync?

2011-03-31 Thread buddhasystem
I'm rebalancing a cluster of 2 nodes at this point. Netstats on the "source" node reports progress of the stream, whereas on the receving end netstats states that progress = 0. Did anyone see that? Do I need both nodes listed as seeds in cassandra.yaml? TIA/ -- View this message in context: ht

Re: help modeling a requirement in cassandra

2011-03-26 Thread buddhasystem
That would depend on how much data is generated per day. If it can still fit in a row, the solution wold be to to just have rows keyed by date, like 20110326. This way you don't have to move data inside the cluster, the selection logic will be in the client. Even if the data is too large to be put

Re: data aggregation in Cassandra

2011-03-25 Thread buddhasystem
Hello Saurabh, I have a similar situation, with a more complex data model, and I do an equivalent of map-reduce "by hand". The redeeming value is that you have complete freedom in how you hash, and you design the way you store indexes and similar structures. If there is a pattern in data store, yo

Re: 0.7.2 choking on a 5 MB column

2011-03-22 Thread buddhasystem
I see. I'm doing something even more drastic then, because I'm only inserting one row in this case, and just use cf.insert(), without batch mutator. It didn't occur to me that was a bad idea. So I take it, this method will fail. Hmm. -- View this message in context: http://cassandra-user-incuba

Re: 0.7.2 choking on a 5 MB column

2011-03-22 Thread buddhasystem
Jonathan, wide rows have been discussed. I thought that the limit on number of columns is way bigger than 45k. What can one expect in reality? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-2-choking-on-a-5-MB-column-tp6198387p6198548.html Se

0.7.2 choking on a 5 MB column

2011-03-22 Thread buddhasystem
I'm writing a row with about 45k columns. Most of them are quite small, and there are a few of 2 MB and one of 5 MB. The write procedure times out. Total data load is 9 MB. What would be the cause? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com

Exception in restart in 0.7.2

2011-03-22 Thread buddhasystem
One machine cluster, low load, 0.7.2 INFO 18:22:31,155 reading saved cache /data1/cassandra_data/saved_caches/system-Schema-KeyCache WARN 18:22:31,155 error reading saved cache /data1/cassandra_data/saved_caches/system-Schema-KeyCache java.io.EOFException at java.io.ObjectInputStream$Pee

Re: Deleting "old" SSTables

2011-03-22 Thread buddhasystem
Jonathan, for all of us just tinker with test clusters, building confidence in the product, it would be nice to be able to do same with nodetool, without jconsole, just my 0.5 penny. Thanks. Jonathan Ellis-3 wrote: > > From the next paragraph of the same wiki page: > > SSTables that are obsol

Cassandra on a cellphone?

2011-03-22 Thread buddhasystem
I know it has zero utility, but I think it has a tremendous coolness and propaganda value -- has anyone tried to run cassandra on a recent generation cell phone/tablet? Or a cluster of these ;) -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cas

Re: cassandra nodes with mixed hard disk sizes

2011-03-22 Thread buddhasystem
aaron morton wrote: > > > Also a node is be responsible for storing it's token range and acting as a > replica for other token ranges. So reducing the token range may not have a > dramatic affect on the storage requirements. > Aaron, is there a way to configure wimpy nodes such that the repl

Re: Reading whole row vs a range of columns (pycassa)

2011-03-20 Thread buddhasystem
erent objects in the same group of 100? > > Dont understand your reference to the OOP in the context of a reading 100 > columns from a row. > > Aaron > > > On 19 Mar 2011, at 16:22, buddhasystem wrote: > > > As I'm working on this further, I want to und

Re: Reading whole row vs a range of columns (pycassa)

2011-03-18 Thread buddhasystem
As I'm working on this further, I want to understand this: Is it advantageous to flatten data in blocks (strings) each containing a series of objects, if I know that a serial object read is often likely, but don't want to resort to OPP? I worked out the optimal granularity, it seems. Is it better

Reading whole row vs a range of columns (pycassa)

2011-03-18 Thread buddhasystem
Is there is noticeable difference in speed between reading the whole row through Pycassa, vs a range of columns? Both rows and columns are pretty slim. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycass

Undead rows after nodetool compact

2011-03-18 Thread buddhasystem
This has been discussed once, but I don't remember the outcome. I insert a row and then delete the key immediately. I then run nodetool compact. In cassanra-cli, "list cf" still return 1 empty row. This is not a showstopper but damn unpretty. Is there a way to make deleted rows go, immediately? -

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Where and how do I choose it? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183069.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks Peter, I can see it better now. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183051.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nab

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks to all for replying, but frankly I didn't get the answer I wanted. Does the "number of disks" apply to number of spindles in RAID0? Or something else like a separate disk for commitlog and for data? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nab

Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Hello, in the instructions, I need to link "concurrent_reads" to number of drives. Is this related to number of physical drives that I have in my RAID0, or something else? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relat

Re: Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread buddhasystem
Thanks! Docs say it's good to set it to 8*Ncores, are saying you see 8 cores in this output? I know I need to go way above default 32 with this setup. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Please-help-decipher-proc-cpuinfo-for-optimal-Ca

Re: Is column update column-atomic or row atomic?

2011-03-16 Thread buddhasystem
Thanks for clarification, Tyler, sorry again for the basic question. I've been doing straight inserts from Oracle so far but now I need to update rows with new columns. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomi

Re: Is column update column-atomic or row atomic?

2011-03-16 Thread buddhasystem
Hello Peter, thanks for the note. I'm not looking for anything fancy. It's just when I'm looking at the following bit of Pycassa docs, it's not 100% clear to me that it won't overwrite the entire row for the key, if I want to simply add an extra column {'foo':'bar'} to the already existing row. I

Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread buddhasystem
Dear All, this is from my new Cassandra server. It obviously uses hyperthreading, I just don't know how to translate this to concurrent readers and writers in cassandra.yaml -- can somebody take a look and tell me what number of cores I need to assume for concurrent_reads and concurrent_writes. Is

Re: Is column update column-atomic or row atomic?

2011-03-15 Thread buddhasystem
Thanks. Can you give me a pycassa example, if possible? Thanks! -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6174487.html Sent from the cassandra-u...@incubator.apache.org mailing list ar

Is column update column-atomic or row atomic?

2011-03-15 Thread buddhasystem
Sorry for the rather primitive question, but it's not clear to me if I need to fetch the whole row, add a column as a dictionary entry and re-insert it if I want to expand the row by one column. Help will be appreciated. -- View this message in context: http://cassandra-user-incubator-apache-org

Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread buddhasystem
Tyler, as a collateral issue - I've been wondering for a while what advantage if any it buys me, if I declare a value 'long' (which it roughly is) as opposed to passing around strings. String is flattened onto a replica of itself, I assume? No conversion? Maybe it even means better speed. Thanks,

Re: Homebrew CF-indexing vs secondary indexing

2011-02-24 Thread buddhasystem
FWIW, for me the advantage of homebrew indexes is that they can be a lot more sophisticated than the standard -- I can hash combinations of column values to whatever I want. I also put counters on column values in the index, so there is lots of functionality. Of course, I can do it because my data

Re: "null" vs "value not found"?

2011-02-24 Thread buddhasystem
Thanks! You are right. I see exception but have no idea what went wrong. ERROR [ReadStage:14] 2011-02-24 21:51:29,374 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:14,5,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.db.columnitera

Re: "null" vs "value not found"?

2011-02-24 Thread buddhasystem
thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] I pretty much went with the default settings, and the column name is 'CATALOG'. Maxim Tyler Hobbs-2 wrote: > > On Thu, Feb 24, 2011 at 2:27 PM, buddhasystem wrote: > >> >> I'm doing

"null" vs "value not found"?

2011-02-24 Thread buddhasystem
I'm doing insertion with a pycassa client. It seems to work in most cases, but sometimes, when I go to Cassandra-cli, and query with key and column that I inserted, I get "null" whereas I shouldn't. What could be causes for that? -- View this message in context: http://cassandra-user-incubator-a

Re: Is it possible to get list of row keys?

2011-02-23 Thread buddhasystem
Is your data updated or large chunks are read-only? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-possible-to-get-list-of-row-keys-tp6055419p6058764.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.co

How come key cache increases speed by x4?

2011-02-23 Thread buddhasystem
Well I know the cache is there for a reason, I just can't explain the factor of 4 when I run my queries on a hot vs cold cache. My queries are actually a chain of one on an inverted index, which produces a tuple of keys to be used in the "main" query. The inverted index query should be downright t

Can I count on Super Column Families why planing 3 years out?

2011-02-23 Thread buddhasystem
There was a discussion here on how well (or not so well) the Super CFs are supported. I now need to make a strategic decision as to how I plan my data. What's the consensus -- will the super CF be there 3 years out? TIA Maxim -- View this message in context: http://cassandra-user-incubator-ap

Will the large datafile size affect the performance?

2011-02-23 Thread buddhasystem
I know that theoretically it should not (apart from compaction issues), but maybe somebody has experience showing otherwise: My test cluster now has 250GB of data and will have 1.5TB in its reincarnation. If all these data is in a single CF -- will it cause read or write performance problems? Sho

Re: Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem
> LongType validation is pretty close (just a size check). > > If you meant that the conversion is killing performance on your > client, you should switch to a more performant client language. :) > > On Fri, Feb 18, 2011 at 9:56 PM, buddhasystem wrote: >> >> I'

Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem
I've been too smart for my own good trying to type columns, on the theory that it would later increase performance by having more efficient comparators in place. So if a string represents an integer, I would convert it to an integer and declare the column as such. Same for LONG. What I found is t

Re: create additional secondary index

2011-02-16 Thread buddhasystem
I sidestep this problem by using a Python script (pycassa-based) where I configure my CFs. This way, it's reproducible and documented. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/create-additional-secondary-index-tp6033574p6033683.html Sent f

Re: Patterns for writing enterprise applications on cassandra

2011-02-15 Thread buddhasystem
FWIW, we'll keep RDBMS for transactional data, and Cassandra will be used for referential data (browsing history and data mining). Horses for courses. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-

Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem
Thank you Attila! We will indeed have a few months of "breaking in". I suppose I'll keep my fingers crossed and see that 0.7.X is very stable. So I'll deploy 0.7.1 -- I will need to apply all the patches, there is no cumulative download, is that correct? Attila Babo wrote: > > 0.6.8 is stable

Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem
Thank you! It's just that 7.1 seems the bleeding edge now (a serious bug fixed today). Would you still trust it as a production-level service? I'm just slightly concerned. I don't want to create a perception among our IT that the product is not ready for prime time. -- View this message in contex

What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem
Hello, we are acquiring new hardware for our cluster and will be installing it soon. It's likely that I won't need to rely on secondary index functionality, as data will be write-once read-many and I can get away with inverse index creation at load time, plus I have some more complex indexing in

Re: Calculating the size of rows in KBs

2011-02-11 Thread buddhasystem
Does it also mean that the whole row will be deserialized when a query comes just for one column? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Calculating-the-size-of-rows-in-KBs-tp6011243p6017870.html Sent from the cassandra-u...@incubator.a

Re: Limit on amount of CFs

2011-02-11 Thread buddhasystem
I asked a similar question (but didn't receive an answer). I'm trying to see if a large number of CFs might be beneficial. One thing I can think about is the size of extra storage needed for compaction -- obviously it will be smaller in case of many smaller CFs. -- View this message in context:

Re: Column name size

2011-02-11 Thread buddhasystem
I've been thinking about this as well. I'm migrating data from a large Oracle database, and the RDBMS columns names are descriptive (good) and long (bad). For now I just keep them when populating Cassandra, but I can shave off about 30% of storage by hashing names. I don't need any automation and

What will happen if I try to compact with insufficient headroom?

2011-02-09 Thread buddhasystem
One of my nodes is 76% full. I know that one of CFs represents 90% of the data, others are really minor. Can I still compact under these conditions? Will it crash and lose the data? Will it try to create one very large file out of fragments, for that dominating CF? TIA -- View this message in c

Re: Specifying row caching on per query basis ?

2011-02-09 Thread buddhasystem
Jonathan, what if the data is really homogeneous, but over a long period of time. I decided that the users who hit the database for recent past should have a better ride. Splitting into a separate CF also has costs, right? In fact, if I were to go this way, do you think I can crank down the key c

Re: Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem
Thanks for the comment! In my case, I want to store various time slices as indexes, so the content can be serialized as comma-separated concatenation of unique object IDs. Example: on 20101204, multiple clouds experienced a variety of errors in job execution. In addition, multiple users ran (or fa

Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem
Seeing that discussion here about indexes not supported in superCFs, and less than clear future of superCFs altogether, I was thinking about getting a modicum of same functionality with serialized objects inside columns. This way the column key becomes sort of analog of supercolumn key, and I hand

Re: Java bombs during compaction, please help

2011-02-07 Thread buddhasystem
Thanks Jonathan -- does it mean that the machine is experiencing IO problems? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Java-bombs-during-compaction-please-help-tp6001773p6002320.html Sent from the cassandra-u...@incubator.apache.org maili

Java bombs during compaction, please help

2011-02-07 Thread buddhasystem
Hello, one node in my 3-machine cluster cannot perform compaction. I tried multiple times, it ran out of heap space once and I increased it. Now I'm getting the dump below (after it does run for a few minutes). I hope somebody can shed a little light on what' going on, because I'm at a loss and th

Re: Finding the intersection results of column sets of two rows

2011-02-06 Thread buddhasystem
Hello, If the amount of data is _that_ small, you'll have a much easier life with MySQL, which supports the "join" procedure -- because that's exactly what you want to achieve. asil klin wrote: > > Hi all, > > I want to procure the intersection of columns set of two rows (from 2 > different c

Re: order of index expressions

2011-02-05 Thread buddhasystem
Jonathan, what's the implementation of that? I.e. is is a product of indexes or nested loops? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/order-of-index-expressions-tp5995909p5996488.html Sent from the cassandra-u...@incubat

Re: How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem
at 11:59 AM, buddhasystem wrote: >> >> Just wanted to see if someone with experience in running an actual >> service >> can advise me: >> >> how often do you run nodetool compact on your nodes? Do you stagger it in >> time, for each node? How badly is performance affec

How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem
Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly is performance affected? I know this all seems too generic but then again no two clusters are created

Re: Moving data

2011-02-04 Thread buddhasystem
FWIW, I'm working on migrating a large amount of data out of Oracle into my test cluster. The data has been warehoused as CSV files on Amazon S3. Having that in place allows me to not put extra load on the production service when doing many repeated tests. I then parse the data using CSV Python mo

Re: Using Cassandra to store files

2011-02-04 Thread buddhasystem
Even when storage is in NFS, Cassandra can still be quite useful as a file catalog. Your physical storage can change, move etc. Therefore, it's a good idea to provide mapping of logical names to physical store points (which in fact can be many). This is a standard technique used in mass storage.

Re: Slow network writes

2011-02-03 Thread buddhasystem
Dude, are you asking me to unsubscribe? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5991488.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Using Cassandra to store files

2011-02-03 Thread buddhasystem
CouchDB -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: How do I get 0.7.1?

2011-02-02 Thread buddhasystem
Stephen, sorry I didn't understand your missive. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5987184.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Slow network writes

2011-02-02 Thread buddhasystem
Never mind, I found it in SVN... (not in gz) Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5986949.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Slow network writes

2011-02-02 Thread buddhasystem
Jonathan, where do I find that contrib/stress? Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5986937.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

How do I get 0.7.1?

2011-02-02 Thread buddhasystem
Thanks. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5986927.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Cassandra memory needs

2011-02-02 Thread buddhasystem
Oleg, I just wanted to add that I confirmed the importance of that "rule of thumb" the hard way. I created two extra CFs and was able to reliably crash the nodes during writes. I guess for the final setting I'll rely on results of my testing. But it's also important to not cause the swap death o

Re: Counters in 0.8 -- conditional?

2011-02-02 Thread buddhasystem
Thanks. Yes I know it's by no means trivial. I thought in case there was an index on the column on which I want to place condition, the index machinery itself can do the counting (i.e. when the index is updated, the counter is incremented). It doesn't seem too orthogonal to the current implementat

Re: Counters in 0.8 -- conditional?

2011-02-02 Thread buddhasystem
Thanks. Just wanted to note that counting the number of rows where foo=bar is a fairly ubiquitous task in db applications. In case of "big data", trafficking all these data to client just to count something isn't optimal at all. Maxim -- View this message in context: http://cassandra-user-incu

Re: Commit log compaction

2011-02-02 Thread buddhasystem
Thank you. So what is exactly the condition that causes the older commit log files to actually be removed? I observe that indeed they are rotated out when the threshold is reached, but then new ones a placed in the directory and the older ones are still there. Thanks, Maxim -- View this message

Commit log compaction

2011-02-02 Thread buddhasystem
How often and by what criteria is the commit log compacted/truncated? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Commit-log-compaction-tp5985221p5985221.html Sent from the cassandra-u...@incubator.apache.org mailing list arc

Counters in 0.8 -- conditional?

2011-02-02 Thread buddhasystem
I'm looking at http://wiki.apache.org/cassandra/Counters So, the counter feature -- it doesn't seem to count rows based in criteria, such as index condition. Is that correct? Case in point, I keep a large inventory of computational tasks over a long period of time. I'm supposed to report on fair

Re: cassandra as session store

2011-02-01 Thread buddhasystem
For completeness: http://stackoverflow.com/questions/3746685/running-django-site-in-multiserver-environment-how-to-handle-sessions http://docs.djangoproject.com/en/dev/topics/http/sessions/#using-cached-sessions I guess your approach does make sense, one only wishes that the servlet in question

Re: cassandra as session store

2011-02-01 Thread buddhasystem
Most if not all modern web application frameworks support sessions. This applies to Django (with which I have most experience and also run it with X.509 security layer) but also to Ruby on Rails and Pylons. So, why would you re-invent the wheel? Too messy. It's all out there for you to use. Rega

TSocket timing out

2011-01-29 Thread buddhasystem
When I do a lot of inserts into my cluster (>10k at a time) I get timeouts from Thrift, the TScoket.py module. What do I do? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/TSocket-timing-out-tp5973548p5973548.html Sent from the

Re: Node going down when streaming data, what next?

2011-01-28 Thread buddhasystem
It does remove tokens, and the "ring" shows that the problematic node owns 0 tokens, which is OK. However, it's still there, listed. It's not a bug but kind of like a feature -- you can move that node back in two days later and "move" tokens in same or different way. What I wish happened was tha

Re: Node going down when streaming data, what next?

2011-01-28 Thread buddhasystem
Sorry Aaron but this doesn't help. As I said, machine is dead, kaput, finished. So I can't do "decommission". I can "remove token" to any other node, but -- the dead machine is going to hang around in my "ring" reports like a zombie. -- View this message in context: http://cassandra-user-incuba

Re: Cassandra and count

2011-01-28 Thread buddhasystem
As far as I know, there are no aggregate operations built into Cassandra, which means you'll have to retrieve all of the data to count it in the client. I had a thread on this topic 2 weeks ago. It's pretty bad. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146

Re: Node going down when streaming data, what next?

2011-01-27 Thread buddhasystem
OK, after running "repair" and waiting overnight the rebalancing worked and now 3 nodes share the load as I expected. However, one node that is broken is still listed in the ring. I have no intention of reviving it. What's the optimal way to get rid of it as far as the ring configuration is concer

Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem
I would ask myself a different question, which is what media-hosting sites use (YouTube and all others). Cassandra still may have its usefulness here as a mapper between a logical id and physical file location. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n

Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem
Will it work for a billion rows? Because that's where eventually I'll end up being. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5966284.html Sent from the cassandra-u...@incubator.apache.or

RE: Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem
Removetoken command just never returns. There is nothing streaming in the cluster. Anyone knows what might be happening? nodetool ring returns different results on two nodes compared to the third one (which is the first in the ring). Weirdness started when I did move 0 on the no-defunct node whi

RE: Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem
Thanks, I'll look at the configuration again. In the meantime, I can't "move" the first node in the ring (after I removed the previous node's token) -- it throws an exception and says data is being streamed to it -- however, this is not what netstats says! Weirdness continues... Maxim -- View

Re: Schema Design

2011-01-26 Thread buddhasystem
Bill, it's all explained here: http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size,the Watch the number of CFs and the memtable sizes. In my experience, this all matters. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Des

Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem
Sorry if this sounds silly, but I can't get my brain around this one: if all nodes contain replicas, why does the cluster stream data every time I more or remove a token? If the data is already there, what needs to be streamed? Thanks Maxim -- View this message in context: http://cassandra-use

Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem
Hello, from what I know, you don't really have to restart "simultaneously", although of course you don't want to wait. I finally decided to use "removetoken" command to actually scratch out the sickly node from the cluster. I'll bootstrap is later when it's fixed. -- View this message in cont

Re: Schema Design

2011-01-26 Thread buddhasystem
I used the term "sharding" a bit frivolously. Sorry. It's just splitting semantically homogenious data among CFs doesn't scale too well, as each CF is allocated a piece of memory on the server. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Sche

Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem
Bump. I still don't know what is the best things to do, plz help. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html Sent from the cassandra-u...@incubator.apache.org mailing list

Re: Schema Design

2011-01-26 Thread buddhasystem
Having separate columns for Year, Month etc seems redundant. It's tons more efficient to keep say UTC time in POSIX format (basically integer). It's easy to convert back and forth. If you want to get a range of dates, in that case you might use Order Preserving Partitioner, and sort out which sys

Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem
I was moving a node and at some point it started streaming data to 2 other nodes. Later, that node keeled over and let's assume I can't fix it for the next 3 days and just want to move tokens on the remaining three to even out and see if I can live with it. But I can't do that! The node that was

Re: Re-partitioning the cluster with nodetool: what's happening?

2011-01-25 Thread buddhasystem
Correction -- what I meant to say that I do see announcements about streaming in the output, but these are stuck at 0%. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-partitioning-the-cluster-with-nodetool-what-s-happening-tp5960843p5960851.

Re-partitioning the cluster with nodetool: what's happening?

2011-01-25 Thread buddhasystem
I'm trying re-partition my 4-node cluster to make the load exactly 25% on each node. As per recipes found in documentation, I calculate: >>> for x in xrange(4): ... print 2**127/4*x ... 0 42535295865117307932921825928971026432 85070591730234615865843651857942052864 1276058875953519237987654777

Re: Stress test inconsistencies

2011-01-25 Thread buddhasystem
Oleg, I'm a novice at this, but for what it's worth I can't imagine you can have a _sustained_ 1kHz insertion rate on a single machine which also does some reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem to square with a typical seek time on a hard drive. Maxim -- V

Re: Forcing GC w/o jconsole

2011-01-25 Thread buddhasystem
Thanks! It doesn't seem to have any effect on GCing dropped CFs, though. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Forcing-GC-w-o-jconsole-tp5956747p5960100.html Sent from the cassandra-u...@incubator.apache.org mailing list archive

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-24 Thread buddhasystem
Thanks Aaron. As I remarked earlier (and it seems it not uncommon) none of the nodes have X11 installed (I think I could arrange this, but it's a bit of a hassle). So if I understand correctly, jconsole is a X11 app, and I'm out of luck with that. I would agree with you that having a proper nodet

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-24 Thread buddhasystem
Thanks for the note, yes, I do know what files I don't need anymore. And, I do realize the difference between grace period of CFs, and garbage collection (or at least I hope I do). On the face value, documentation wasn't precise enough about JVM GC taking care of dropped CFs. I understand this i

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-24 Thread buddhasystem
OK, so I'm looking at this page: http://wiki.apache.org/cassandra/MemtableSSTable This looks promising: "A compaction marker is also added to obsolete sstables so they can be deleted on startup if the server does not perform a GC before being restarted." So it would seem that if I restart the s

Forcing GC w/o jconsole

2011-01-24 Thread buddhasystem
My situation is similar to one described at this link: http://stackoverflow.com/questions/4155696/how-to-trigger-manual-java-gc-from-linux-console-with-no-x11 I'm trying the following command but it fails (connection refused) java -jar cmdline-jmxclient-0.10.3.jar - localhost:8081 java.lang:type

Multiple indexes - how does Cassandra handle these internally?

2011-01-21 Thread buddhasystem
Greetings -- if I use multiple secondary indexes in the query, what will Cassandra do? Some examples say it will index on first EQ and then loop on others. Does it ever do a proper index product to avoid inner loops? Thanks Maxim -- View this message in context: http://cassandra-user-incubat

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread buddhasystem
Thanks! What's strange anyhow is that the GC period for these cfs expired some days ago. I thought that a compaction would take care of these tombstones. I used nodetool to "compact". -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-C

Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread buddhasystem
Greetings, I just used teh nodetool to force a major compaction on my cluster. It seems like the cfs currently in service were indeed compacted, while the old test materials (which I dropped from CLI) were still there as tombstones. Is that the expected behavior? Hmm... TIA. -- View this mess

  1   2   >