SV: key types and grouping related rows together

2010-07-15 Thread Thorvaldsson Justus
Dont forget you can make your own sorting algorithm. Here is a nice tutorial for that. http://www.sodeso.nl/?p=421 Justus Från: Schubert Zhang [mailto:zson...@gmail.com] Skickat: den 15 juli 2010 04:20 Till: user@cassandra.apache.org Ämne: Re: key types and grouping related rows together for

Re: Data in Cassandra

2010-07-15 Thread Dimitry Lvovsky
It could be that your Cassandra nodes haven't full compacted yet. On Thu, Jul 15, 2010 at 5:55 AM, Hendro Kaskus hen...@kaskusnetworks.comwrote: Hi everyone, I'm newbie to Cassandra :D.. I try to insert data from MySQL to Cassandra. Data dump from MySQL is about 11 MB (64716 records). But

Re: Data in Cassandra

2010-07-15 Thread Jonathan Ellis
Short answer: yes, this is normal. Longer answer: this was discussed at length on this list a few days ago, check the archives. On Wed, Jul 14, 2010 at 10:55 PM, Hendro Kaskus hen...@kaskusnetworks.com wrote: Hi everyone, I'm newbie to Cassandra :D.. I try to insert data from MySQL to

Re: Bootstrap Token collision

2010-07-15 Thread Gary Dusbabek
Did you add a new node to the cluster at the time you restarted it? If not, I would think that each node already had a token that would make such a collision impossible, unless we have a new bug to troubleshoot. Gary. On Wed, Jul 14, 2010 at 20:46, Mubarak Seyed mubarak.se...@gmail.com wrote:

Re: key types and grouping related rows together

2010-07-15 Thread S Ahmed
Well I'm not talking about a specific column family here, as ALL my column families will have content that is specific to a certain website, so I need a strategy that I will use on almost all my column families. On Wed, Jul 14, 2010 at 9:20 PM, Schubert Zhang zson...@gmail.com wrote: for your

want to change algorithm used in OPP for token and key comparison

2010-07-15 Thread Sagar Agrawal
Hi, I am using OrderPreservingPartitioner, and my keys are integers which are stored as strings, I want to manually assign token values equal to my key values such that data is equally distributed. So for this to work, I want to convert the token and key strings to integers before doing compareTo

Re: key types and grouping related rows together

2010-07-15 Thread S Ahmed
Do you think a composite key using a key type of Bytes would work? How many bytes can it be? public static byte [] createRowKey(int websiteid, long stamp) throws Exception { byte [] websiteidBytes = Bytes.toBytes(websiteid); byte [] stampBytes = Bytes.toBytes(stamp); return

Re: mmap

2010-07-15 Thread Peter Schuller
Can someone please explain the mmap issue. mmap is default for all storage files for 64bit machines. according to this case https://issues.apache.org/jira/browse/CASSANDRA-1214 it might not be a good thing. Is it right to say that you should use mmap only if your MAX expected data is smaller

Re: mmap

2010-07-15 Thread Schubert Zhang
I found, for large dataset, long-term random reading test, the performance with mmap is very bad. See the attached chart in https://issues.apache.org/jira/browse/CASSANDRA-1214. On Fri, Jul 16, 2010 at 12:41 AM, Peter Schuller peter.schul...@infidyne.com wrote: Can someone please explain the

Re: Hintedhandoff will never complete when a BIG rowmutation

2010-07-15 Thread Schubert Zhang
Yes, I think current HintedHandOff implementation in 0.6.x cannot support large hints, it is a risk in a production system. On Tue, Jun 29, 2010 at 12:31 AM, albert_e dongz...@gmail.com wrote: In 0.6.2, HH sending MUTATION message using the same OutboundTcpConnection with READ message. When

Re: ERROR 22:59:00,329 Error in ThreadPoolExecutor

2010-07-15 Thread Claire Chang
I am using Random Partitioner. The other 2 nodes are working fine. There are no Errors in the log files for the 2 good nodes. There were no log messages within 30 minutes before the exception occurs. Here is the last log statement before the exception occurred. INFO [COMPACTION-POOL:1]

nodetool repair

2010-07-15 Thread B. Todd Burruss
if i have N=3 and run nodetool repair on node X. i assume that merkle trees (at a minimum) are calculated on nodes X, X+1, and X+2 (since N=3). when the repair is finished are nodes X, X+1, and X+2 all in sync with respect to node X's data? or does X have the latest data and X+1 and X+2 still

Re: CassandraBulkLoader

2010-07-15 Thread Torsten Curdt
If you could can you please share the command line function (to load TSV)? There is no command line function ... you have to write code for this. and Can you please help me on storing storage-conf.xml on HDFS part? As I said. Maybe you better start with a simpler scenario and leave out HDFS

Re: mmap

2010-07-15 Thread Peter Schuller
I'm convinced. :)  See comments on https://issues.apache.org/jira/browse/CASSANDRA-1214 Noted :) To be clear I only mentioned it as an acknowledgement that everyone didn't necessarily agree with what I was saying. The main problem is not the syscall so much as Java insisting on zeroing out

Re: nodetool repair

2010-07-15 Thread Jonathan Ellis
On Thu, Jul 15, 2010 at 1:54 PM, B. Todd Burruss bburr...@real.com wrote: if i have N=3 and run nodetool repair on node X.  i assume that merkle trees (at a minimum) are calculated on nodes X, X+1, and X+2 (since N=3).  when the repair is finished are nodes X, X+1, and X+2 all in sync with

Re: key types and grouping related rows together

2010-07-15 Thread Benjamin Black
Keys are always sorted (in 0.6) as UTF8 strings. The CompareWith applies to _columns_ within rows, _not_ to row keys. On Wed, Jul 14, 2010 at 1:44 PM, S Ahmed sahmed1...@gmail.com wrote: Where is the link that describes the various key types and their impact on sorting? (I believe I read it

Re: Bootstrap question

2010-07-15 Thread Anthony Molinaro
This is a cluster which is horribly imbalanced because I didn't assign initial tokens, so I'm adding 6 nodes with tokens according to the operations page (ie, i * (2^127/N) with N = 6). So here's what the ring will look like when bootstrap finishes

Re: Bootstrap question

2010-07-15 Thread Anthony Molinaro
Oh, and looking at the load on the new machines it appears that New 2 and New 6 have gotten some data (although neither is in the ring yet). Not sure if that clears anything up though. -Anthony On Thu, Jul 15, 2010 at 01:28:06PM -0700, Anthony Molinaro wrote: This is a cluster which is

Re: Bootstrap question

2010-07-15 Thread Jonathan Ellis
On Thu, Jul 15, 2010 at 3:28 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: Is the fact that 2 new nodes are in the range messing it up? Probably.  And if so how do I recover (I'm thinking, shutdown new nodes 2,3,4,5, the bringing up nodes 2,4, waiting for them to finish, then

Re: mmap

2010-07-15 Thread Carlos Alvarez
On Thu, Jul 15, 2010 at 2:01 PM, Jonathan Ellis jbel...@gmail.com wrote: The main problem is not the syscall so much as Java insisting on zeroing out any buffer you create, which is a big hit to performance when you're allocating buffers for file i/o on each request instead of just mmaping

Re: key types and grouping related rows together

2010-07-15 Thread S Ahmed
Benjamin, Ah, thanks for clarifying that. key sorting is changing in .7 I believe to support a binary array? On Thu, Jul 15, 2010 at 3:26 PM, Benjamin Black b...@b3k.us wrote: Keys are always sorted (in 0.6) as UTF8 strings. The CompareWith applies to _columns_ within rows, _not_ to row

Re: key types and grouping related rows together

2010-07-15 Thread S Ahmed
Given a CF like: Articles : { key1 : { title:some title, body: this is my article body..., }, key1 : { title:some title, body: this is my article body..., } } Now these articles could be for different websites e.g. www.website1.com, www.website2.com If I want to get the latest

Re: mmap

2010-07-15 Thread Jonathan Ellis
On Thu, Jul 15, 2010 at 3:56 PM, Carlos Alvarez cbalva...@gmail.com wrote: On Thu, Jul 15, 2010 at 2:01 PM, Jonathan Ellis jbel...@gmail.com wrote: The main problem is not the syscall so much as Java insisting on zeroing out any buffer you create, which is a big hit to performance when you're

How to change the RF and repair

2010-07-15 Thread Mubarak Seyed
Just want to verify with group that what i am doing wrt RF is correct. 1. Nodes were running with RF=2 2. Stopped all the nodes, changed the RF to 4 3. Started all the nodes, verify the cluster ring using nodetool, all the nodes are part of cluster 4. Ran nodetool repair on all the nodes 5. Ran

Re: How to change the RF and repair

2010-07-15 Thread Jonathan Ellis
On Thu, Jul 15, 2010 at 5:29 PM, Mubarak Seyed mubarak.se...@gmail.com wrote:  Just want to verify with group that what i am doing wrt RF is correct. 1. Nodes were running with RF=2 2. Stopped all the nodes, changed the RF to 4 3. Started all the nodes, verify the cluster ring using nodetool,

Re: mmap

2010-07-15 Thread Clint Byrum
On Jul 15, 2010, at 2:52 PM, Jonathan Ellis wrote: On Thu, Jul 15, 2010 at 3:56 PM, Carlos Alvarez cbalva...@gmail.com wrote: On Thu, Jul 15, 2010 at 2:01 PM, Jonathan Ellis jbel...@gmail.com wrote: The main problem is not the syscall so much as Java insisting on zeroing out any buffer you

Re: mmap

2010-07-15 Thread Jonathan Ellis
On Thu, Jul 15, 2010 at 5:46 PM, Clint Byrum cl...@ubuntu.com wrote: One other approach that works on Linux is to use HugeTLB. This post details the process for doing so with a jvm: http://andrigoss.blogspot.com/2008/02/jvm-performance-tuning.html Basically when mmapping using HUGETLB you

A very short summary on Cassandra for a book

2010-07-15 Thread Karoly Negyesi
Hi, I am writing a scalability chapter in a book and I need to mention Apache Cassandra although it's just a mention. Still I would not like to be sloppy and would like to get verification whether my summary is accurate. Cassandra stores four or five dimension associated arrays. The first

Re: key types and grouping related rows together

2010-07-15 Thread Aaron Morton
You could build a secondary index, e.g.CFArticles : {article_id1 : {}article_id2 : {}}CFWebsiteArticle : {website_id1 : { time_uuid : article_id1, time_uuid2 : article_id2}}when you want to get the last 10 for a website, get_slice from the WebsiteArticle CF then multi get from Articles. Am

Re: ERROR 22:59:00,329 Error in ThreadPoolExecutor

2010-07-15 Thread Claire Chang
i saw this in the kernel log: jsvc uses 32-bit capabilitie. Is this right? our server is Linux 2.6.32-23-generic #37-Ubuntu SMP Fri Jun 11 08:03:28 UTC 2010 x86_64 GNU/Linux On Jul 15, 2010, at 11:04 AM, Claire Chang wrote: I am using Random Partitioner. The other 2 nodes are working fine.

Re: A very short summary on Cassandra for a book

2010-07-15 Thread Dave Viner
I am no expert... but parts seem accurate, parts not. Cassandra stores four or five dimension associated arrays not sure what you're counting as a dimension of the associated array, but here are the 2 associative array-like syntaxes: ColumnFamily[row-key][column-name] = value1

Re: Seeing very weird results on 0.6.2 when paginating through a ColumnFamily with get_slice()

2010-07-15 Thread Ilya Maykov
The column names are arbitrary strings, so it's not obvious what the next value should be at any step. So, I just set the start of the next page to the end of the last page and eliminate the duplicate value when joining the 2 pages together. The paging direction does not matter in my case, as I

Re: A very short summary on Cassandra for a book

2010-07-15 Thread David Strauss
On 2010-07-16 01:57, Dave Viner wrote: I am no expert... but parts seem accurate, parts not. Cassandra stores four or five dimension associated arrays not sure what you're counting as a dimension of the associated array, but here are the 2 associative array-like syntaxes:

Re: key types and grouping related rows together

2010-07-15 Thread Aaron Morton
yes, you need to maintain the secondary index your self. Send a batch_mutation and write the article and website article colums at the same time. I think your safe up to a large number of cols, say 1M Not sure, may try to track the info down one day.AOn 16 Jul, 2010,at 03:39 PM, S Ahmed

Re: Bootstrap question

2010-07-15 Thread Anthony Molinaro
Okay, so things were pretty messed up. I shut down all the new nodes, then the old nodes started doing the half the ring is down garbage which pretty much requires a full restart of everything. So I had to shut everything down, then bring the seed back, then the rest of the nodes, so they