Strange behavior with large column values

2010-05-10 Thread Jared Laprise
Hello all, I'm really stumped on this issue. I'm using the PHP Thrift client along with Pandra. I have a ColumnFamily `Groups` and once I set the `description` column value to larger than a couple hundred characters my response time goes from 0.0057 to almost 1 second. Anyone experienced someth

Re: How to write WHERE .. LIKE query ?

2010-05-10 Thread vd
Hi Mike AFAIK cassandra queries only on keys and not on column names, please verify. On Tue, May 11, 2010 at 11:06 AM, Mike Malone wrote: > > > On Mon, May 10, 2010 at 9:00 PM, Shuge Lee wrote: >> >> Hi all: >> How to write WHERE ... LIKE query ? >> For examples(described in Python): >> Schem

Re: How to write WHERE .. LIKE query ?

2010-05-10 Thread Mike Malone
On Mon, May 10, 2010 at 9:00 PM, Shuge Lee wrote: > Hi all: > > How to write WHERE ... LIKE query ? > For examples(described in Python): > > Schema: > > # columnfamily name > resources = [ ># key > 'foo': { > # columns and value > 'url': 'foo.com', > 'pushlier': 'f

How to write WHERE .. LIKE query ?

2010-05-10 Thread Shuge Lee
Hi all: How to write WHERE ... LIKE query ? For examples(described in Python): Schema: # columnfamily name resources = [ # key 'foo': { # columns and value 'url': 'foo.com', 'pushlier': 'foo', }, 'oof': { 'url': 'oof.com', 'pushlier': 'off',

Re: Tuning Cassandra

2010-05-10 Thread Benjamin Black
The performance you are describing is completely abnormal. The first step in troubleshooting it is profiling your client behavior because that is almost certainly where the problem is. Where is it spending its time? If that ultimately indicates it is really waiting on Cassandra, you can turn you

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
> > Mike just suggested to concate comment id with each of the comment field > names so that the above data can be stored in normal column family. It looks > fine except that I'm not sure the time sorting on comments still works or > not. > In the case of time you can just use lexicographically so

Re: Is SuperColumn necessary?

2010-05-10 Thread AJ Chen
{ "b1" { blog-id: b1 author: ba1 tittle: bt1 comment-timeuuid-1: {author: ca1 id: comment-timeuuid-1 text: text 1 comment-timeuuid-2: {author: ca2

Re: Is SuperColumn necessary?

2010-05-10 Thread AJ Chen
in your implementation, is the comment still sorted by TIME? Will UTF8Type sort :author by time? thanks, -aj On Mon, May 10, 2010 at 5:02 PM, Mike Malone wrote: > On Mon, May 10, 2010 at 4:31 PM, AJ Chen wrote: > >> supercolumn is good for modeling profile type of data. simple example is >> bl

Re: Is SuperColumn necessary?

2010-05-10 Thread William Ashley
I'm having a difficult time understanding your syntax. Could you provide an example with actual data? On May 10, 2010, at 5:25 PM, AJ Chen wrote: > your suggestion works for fixed supercolumn name. the blog example now > becomes: > { blog-id {name, title, ...} > blog-id-comments {time:comment

Re: Is SuperColumn necessary?

2010-05-10 Thread AJ Chen
your suggestion works for fixed supercolumn name. the blog example now becomes: { blog-id {name, title, ...} blog-id-comments {time:commenter} } what about supercolumn names that are not fixed? for example, I want to store comment's details with the blog like this: { blog-id { blog { name, title

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
On Mon, May 10, 2010 at 4:31 PM, AJ Chen wrote: > supercolumn is good for modeling profile type of data. simple example is > blog: > blog { blog {author, title, ...} > comments {time: commenter} //sort by TimeUUID > } > when retrieving a blog, you get all the comments sorted by time

Re: Is SuperColumn necessary?

2010-05-10 Thread William Ashley
If you're storing your super column under a fixed name, you could just concatenate that name with the row key and use normal columns. Then you get your paging and sorting the way you want it. On May 10, 2010, at 4:31 PM, AJ Chen wrote: > supercolumn is good for modeling profile type of data. s

Re: Is SuperColumn necessary?

2010-05-10 Thread AJ Chen
supercolumn is good for modeling profile type of data. simple example is blog: blog { blog {author, title, ...} comments {time: commenter} //sort by TimeUUID } when retrieving a blog, you get all the comments sorted by time already. without supercolumn, you would need to concatenate mu

Rolling upgrade

2010-05-10 Thread Tatsuya Kawano
Hi, Does Cassandra support rolling restart recipe between minor version upgrade? I mean rolling restart is a way to upgrade Cassandra version or change configuration **without** bringing down the whole cluster. The recipe will be something like killing a couple of nodes at a time and starting them

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
On Mon, May 10, 2010 at 1:38 PM, AJ Chen wrote: > Could someone confirm this discussion is not about abandoning supercolumn > family? I have found modeling data with supercolumn family is actually an > advantage of cassadra compared to relational database. Hope you are going to > drop this import

Re: Is SuperColumn necessary?

2010-05-10 Thread AJ Chen
Could someone confirm this discussion is not about abandoning supercolumn family? I have found modeling data with supercolumn family is actually an advantage of cassadra compared to relational database. Hope you are going to drop this important concept. How it's implemented internally is a differe

Re: Human readable Cassandra limitations

2010-05-10 Thread Paul Prescod
On Mon, May 10, 2010 at 1:23 PM, Peter Hsu wrote: > Thanks for the response, Paul. > ... > > * Cassandra and its siblings are weak at ad hoc queries on tables > that you did not think to index in advance > > What is the normal way of dealing with this in Cassandra?  Would you just > create a new "

Re: Human readable Cassandra limitations

2010-05-10 Thread Peter Hsu
Thanks for the response, Paul. Very helpful, but very general at the same time. I'm still having trouble translating these into actual use cases.Let me think of some better questions before I continue the thread, but I'd like to address one of the weaknesses you brought up: > * Cassandra

Re: ColumnPath Usage

2010-05-10 Thread Jonathan Ellis
if you want to use raw Thrift, you should read http://wiki.apache.org/cassandra/API where the things like ColumnPath are explained On Mon, May 10, 2010 at 1:08 PM, Atul Gosain wrote: > Jonathan > >   We just wanted to benchmark data insertion rates on the Cassandra store, > so we started with usi

Re: Human readable Cassandra limitations

2010-05-10 Thread Paul Prescod
Also: * you should Google "eventual consistency" to learn about the strengths and weaknesses of that. On Mon, May 10, 2010 at 11:22 AM, Paul Prescod wrote: > This is a very, very big topic. For the most part, the issues are > covered in the various SQL versus NoSQL debates all over the Internet

Re: Human readable Cassandra limitations

2010-05-10 Thread Paul Prescod
This is a very, very big topic. For the most part, the issues are covered in the various SQL versus NoSQL debates all over the Internet. For example: * Cassandra and its NoSQL siblings have no concept of an in-database "join" * Cassandra and its NoSQL siblings do not allow you to update multipl

Re: Data Modeling Conundrum

2010-05-10 Thread William Ashley
Yeah, I intentionally didn't mention the expected data set size, hoping I could find a more elegant solution that would work both in the small N and large N cases. In any case, I appreciate the recommendations. When I get some time I am interested in looking at the source and figuring out wheth

Re: ColumnPath Usage

2010-05-10 Thread Atul Gosain
Jonathan We just wanted to benchmark data insertion rates on the Cassandra store, so we started with using Thrift API. Is Hector considered better than Thrift because its better documented and intuitive that Thrift ? Any downside to it, as Thrift is a low level API to Cassandra. I hope that Hect

Human readable Cassandra limitations

2010-05-10 Thread Peter Hsu
I've seen a lot of threads and posts about why Cassandra is great. I'm fairly sold on the features, and the few big deployments on Cassandra give it a lot of credibility. However, I don't believe in magic bullets, so I really want to understand the potential downsides of Cassandra. Right now,

Re: Tuning Cassandra

2010-05-10 Thread B. Todd Burruss
have you put your commit log on a disk by itself? not a logical partition shared by oracle or cassandra "data". this will make a difference, as you don't want the cassandra commit logs competing with other OS and oracle I/O. look in storage-conf.xml and see if you can move this. also check

Re: Tuning Cassandra

2010-05-10 Thread Nathan McCall
David, Are you using batchMutate or insert? Jump on over to hector-users if you want API help with either of these. -Nate

Re: Is SuperColumn necessary?

2010-05-10 Thread Jonathan Shook
Agreed On Mon, May 10, 2010 at 12:01 PM, Mike Malone wrote: > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook wrote: >> >> I have to disagree about the naming of things. The name of something >> isn't just a literal identifier. It affects the way people think about >> it. For new users, the whol

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
On Mon, May 10, 2010 at 9:44 AM, Stu Hood wrote: > I think that it is 100% ideal: it's what I've been working on implementing > in #674, #847 and #998. I'm hoping to post a large patchset and docs this > week, and I'm aiming to get it committed for 0.8. > > The work I've been doing doesn't touch

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook wrote: > I have to disagree about the naming of things. The name of something > isn't just a literal identifier. It affects the way people think about > it. For new users, the whole naming thing has been a persistent > barrier. > I'm saying we shou

Re: Is SuperColumn necessary?

2010-05-10 Thread Jonathan Shook
I have to disagree about the naming of things. The name of something isn't just a literal identifier. It affects the way people think about it. For new users, the whole naming thing has been a persistent barrier. As for your suggestions, I'm all for simplifying or generalizing the "how it works" p

Re: Tuning Cassandra

2010-05-10 Thread Jonathan Ellis
TBufferedTransport is a C# thing. It's not necessary in Java. On Mon, May 10, 2010 at 3:48 AM, Ran Tavory wrote: > Hector uses tsocket. not sure what you mean by "buffered" - is that framed? > Hector by default does not use framed. > The code is here if you'd like to have a > look http://github.

Re: ColumnPath Usage

2010-05-10 Thread Jonathan Ellis
Why are you using Thrift from java instead of Hector? http://github.com/rantav/hector On Mon, May 10, 2010 at 8:34 AM, Atul Gosain wrote: > Hi > >  Im really confused about using ColumnPath in thrift java interface. Most of > the examples provide create the ColumnPath with 3 parameters, whereas

Re: Not found error

2010-05-10 Thread Jonathan Ellis
You need to put the cassandra jar on your classpath. (Not sure what you mean by "client is running successfully," if you're seeing this...) On Mon, May 10, 2010 at 7:11 AM, sharanabasava raddi wrote: > Hi, > am getting  "package org.apache.cassandra.thrift does not exist" error, when > I tried t

Re: release date for 0.7 ?

2010-05-10 Thread Jonathan Ellis
Cassandra tries to get releases out within 3-4 months of the previous "major" release (0.6.0). On Sat, May 8, 2010 at 10:57 PM, vineet daniel wrote: > Hi > > What is the expected release date for 0.7 and what will be the feature > specifications for it ? > > __

Re: Is SuperColumn necessary?

2010-05-10 Thread Stu Hood
I think that it is 100% ideal: it's what I've been working on implementing in #674, #847 and #998. I'm hoping to post a large patchset and docs this week, and I'm aiming to get it committed for 0.8. The work I've been doing doesn't touch the user interface: it only deals with the internal chang

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
Maybe... but honestly, it doesn't affect the architecture or interface at all. I'm more interested in thinking about how the system should work than what things are called. Naming things are important, but that can happen later. Does anyone have any thoughts or comments on the architecture I sugge

Re: Tuning Cassandra

2010-05-10 Thread Ronald Park
That sounds like "Did you start beating your wife recently or have you been beating her for a long time now?" :) Ron On Mon, 2010-05-10 at 09:00 -0700, Paul Prescod wrote: > Does the Caasandra performance start fast and slow down (indicating > some buffer being filled) or does it start slow and

Re: Tuning Cassandra

2010-05-10 Thread Paul Prescod
Does the Caasandra performance start fast and slow down (indicating some buffer being filled) or does it start slow and stay slow? On Mon, May 10, 2010 at 2:05 AM, David Boxenhorn wrote: > I read something like 80,000 rows from Oracle and write them to Cassandra in > chunks of 1000 rows - so I'm

Re: Is SuperColumn necessary?

2010-05-10 Thread Schubert Zhang
Yes, the "column" here is not appropriate. Maybe we need not to create new terms, in Google's Bigtable, the term "qualifier" is a good one. On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn wrote: > That would be a good time to get rid of the confusing "column" term, which > incorrectly suggests a

Re: Overfull node

2010-05-10 Thread David Koblas
Sounds great, will give it a go. However, just to make sure I understand getting the keyspace correct. Lets say I've got: A -- Node before overfull node in keyspace order O -- Overfull node B -- Node after O in keyspace order N -- New empty node I'm going to assume that I shoul

Re: Cassandra training on May 21 in Palo Alto

2010-05-10 Thread Ned Wolpert
Probably could charge for it. On Mon, May 10, 2010 at 7:41 AM, Jeremy Dunck wrote: > +1 for Dallas, but I'd go to Austin if needed. > > On Thu, May 6, 2010 at 10:07 PM, Jonathan Shook wrote: > > Dallas > > > > On Thu, May 6, 2010 at 4:28 PM, Jonathan Ellis > wrote: > >> We're planning that now

Re: Cassandra training on May 21 in Palo Alto

2010-05-10 Thread Jeremy Dunck
+1 for Dallas, but I'd go to Austin if needed. On Thu, May 6, 2010 at 10:07 PM, Jonathan Shook wrote: > Dallas > > On Thu, May 6, 2010 at 4:28 PM, Jonathan Ellis wrote: >> We're planning that now.  Where would you like to see one? >> >> On Thu, May 6, 2010 at 2:40 PM, S Ahmed wrote: >>> Do you

ColumnPath Usage

2010-05-10 Thread Atul Gosain
Hi Im really confused about using ColumnPath in thrift java interface. Most of the examples provide create the ColumnPath with 3 parameters, whereas the latest thrift api has ColumnPath with one parameter only. If ColumnPath can be thought of as directory structure or path to the column, th

Re: Tuning Cassandra

2010-05-10 Thread Carlos Alvarez
As far as i know, the tbuffered/tsocket issue only applies to c# (btw, que had a lot of problems with hector#). In java, the class tbufferedtransport doenst exist. Carlos On 5/10/10, Arie Keren wrote: > Using TSocket without TBufferedTransport hurts performance. > > See > http://mail-archives

Re: busy thread on IncomingStreamReader ?

2010-05-10 Thread Даниел Симеонов
Hi, I've experienced the same problem, two nodes got stuck with CPU at 99% and the following source code from IncomingStreamReader class: while (bytesRead < pendingFile.getExpectedBytes()) { bytesRead += fc.transferFrom(socketChannel, bytesRead, FileStreamTask.CHUNK_SIZE);

Re: trying to make my ideas clear about partionning...

2010-05-10 Thread Olivier Mallassi
Thanks! On Mon, May 10, 2010 at 2:17 PM, Dr. Martin Grabmüller < martin.grabmuel...@eleven.de> wrote: > Partitioning is only done for row keys, the part in your message about > keys and partitioning is correct. > There is no partitioning for columns, all columns for a particular key are > store

RE: trying to make my ideas clear about partionning...

2010-05-10 Thread Dr . Martin Grabmüller
Partitioning is only done for row keys, the part in your message about keys and partitioning is correct. There is no partitioning for columns, all columns for a particular key are stored on the same node (plus replicas, of course, which are stored on different nodes). The CompareWith option fo

Not found error

2010-05-10 Thread sharanabasava raddi
Hi, am getting "package org.apache.cassandra.thrift does not exist" error, when I tried to run java thrift example in my system. Note: Cassandra server and client are running successfully. Thanks in advance Regards, Sharan

Cassandra for live statistics aggregation ?

2010-05-10 Thread David
Hi, I am investigating the use of Cassandra to gather and aggregate simple statistics in real time from multiple sources, something quite similar to what is described there: https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra. I have a few questions about how to design a model

trying to make my ideas clear about partionning...

2010-05-10 Thread Olivier Mallassi
Hi all, I am trying to make my ideas clear about how the partioning works in Cassandra. Here is what I understood, pease correct me if I am wrong. - Row key are partitionned based on the partitionning strategy you choose (randon, order preserving, custom if you implemented the IPartioner interfa

Re: Tuning Cassandra

2010-05-10 Thread David Boxenhorn
I read something like 80,000 rows from Oracle and write them to Cassandra in chunks of 1000 rows - so I'm supposedly working to Cassandra's strength and Oracle's weakness. Reading 1000 rows from Oracle is "instantaneous", writing them takes maybe 30 seconds. Not too much data per row, maybe 1K.

RE: Tuning Cassandra

2010-05-10 Thread Arie Keren
Using TSocket without TBufferedTransport hurts performance. See http://mail-archives.apache.org/mod_mbox/cassandra-user/201005.mbox/%3cd0c18662921df14f983c53625de8a7241e4c11f...@34093-mbx-c14.mex07a.mlsrvr.com%3e From: Ran Tavory [mailto:ran...@gmail.com] Sent: May 10, 2010 11:48 AM To: user

Re: Tuning Cassandra

2010-05-10 Thread Ran Tavory
Hector uses tsocket. not sure what you mean by "buffered" - is that framed? Hector by default does not use framed. The code is here if you'd like to have a look http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClientFactory.java#L77

Re: Tuning Cassandra

2010-05-10 Thread David Boxenhorn
You asked for it! You might want to skip to Cassandra.save() ... public class BuildCassandraDB { private static void importContentInterests(Connection connection) throws Exception { Statement statement; ResultSet resultSet; String sql; int count, totalcount;

Re: Tuning Cassandra

2010-05-10 Thread vd
What is the complete code string you are using to connect with cassandra from Java code On Mon, May 10, 2010 at 1:49 PM, David Boxenhorn wrote: > I don't know what "TSocket or the buffered one" means. Maybe I should know? > > I'm using Hector. Does that explain anything? > > On Mon, May 10, 20

Re: Tuning Cassandra

2010-05-10 Thread David Boxenhorn
I don't know what "TSocket or the buffered one" means. Maybe I should know? I'm using Hector. Does that explain anything? On Mon, May 10, 2010 at 11:15 AM, vd wrote: > > Hi > > what is it that you are using to connect with cassnadra TSocket or the > buffered one ? > > >

Re: Tuning Cassandra

2010-05-10 Thread vd
Hi what is it that you are using to connect with cassnadra TSocket or the buffered one ? ___ On Mon, May 10, 2010 at 1:29 PM, David Boxenhorn wrote: > I'm running Java on the client, jdbc queries on Oracle, Hector on

Re: Tuning Cassandra

2010-05-10 Thread David Boxenhorn
I'm running Java on the client, jdbc queries on Oracle, Hector on Cassandra. The Cassandra and Oracle database designs are radically different, as you might guess. I have no doubt that Cassandra can be tuned, in a multiple-server cluster, to have superior throughput (that's why I'm doing it!). Bu

Re: Tuning Cassandra

2010-05-10 Thread vd
Hi David If I may ask...how do you plan to import data from oracle to cassandra ? As answer AFAIK cassandra's true ability comes into play when running on more than one machine...and please share how you are making comparisons like on writes or reads from cassandra.

Tuning Cassandra

2010-05-10 Thread David Boxenhorn
I'm running Oracle and Cassandra on my machine, trying to import my data to Cassandra from Oracle. In my configuration Oracle is about ten times faster than Cassandra. Cassandra has out-of-the-box tuning. I am new to Cassandra. How do I begin trying to tune it? Thanks.

Re: Extremly slow inserts on LAN

2010-05-10 Thread zhang cnan
WHY? On Mon, May 10, 2010 at 1:56 PM, Viktor Jevdokimov wrote: > We had similar experience. > > Problem was with TSocket as transport alone: > >        var transport = new TSocket("192.168.0.123", 9160); >        var protocol = new TBinaryProtocol(transport); >        var client = new Cassandra.C

RE: Extremly slow inserts on LAN

2010-05-10 Thread Arie Keren
That solved the problem! It also helps to increase the buffer size from the default value of 1024: var transport = new TBufferedTransport(new TSocket("192.168.0.123", 9160), 64*1024); Thanks a lot! -Original Message- From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] Sent: May