Sounds like Khanh thinks he can do joins... :-)
User oriented data is easy, key by facebook id, let cassandra handle
location. Set replication factor=3 so you don't lose data and can do
consistent but slower read after write when you need to using quorum.
If you are running on AWS you should
Thanks for the feedback Aaron!
The schema of the CF is default, I just defined the name and the rest is
default, have a look:
Keyspace: TestKS
Read Count: 65
Read Latency: 657.8047076923076 ms.
Write Count: 10756
Write Latency: 0.03237039791744143 ms.
Pending Tasks: 0
Column Family: CFTest
I tracked down the timestamp submission and everything was fine within the
PHP Libraries.
The thrift php extension however seems to have an overflow, because it was
now setting now timestamps with also negative values ( -1242277493 ). I
disabled the php extension and as a result I now got correct
Perfect thanks!
On Sun, Jun 5, 2011 at 4:43 AM, Victor Kabdebon
victor.kabde...@gmail.comwrote:
Again I don't really know the specifics of Solandra but in Solr (so
Solandra being a cousin of Solr it should be true too) you have XML fields
like this :
fields name=hashedpassword
$ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell
org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks
2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection -
Connected to localhost:9160
2011-06-05 16:21:54,517 ERROR [main]
On 6/5/11 16:26, Timo Nentwig wrote:
$ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell
org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks
2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection -
Connected to localhost:9160
2011-06-05
You may be swapping.
http://spyced.blogspot.com/2010/01/linux-performance-basics.html
explains how to check this as well as how to see what threads are busy
in the Java process.
On Sat, Jun 4, 2011 at 5:34 PM, Philippe watche...@gmail.com wrote:
Hello,
I am evaluating using cassandra and I'm
If you're not using 0.8.0 the cli deals poorly with non-string row keys.
On Sat, Jun 4, 2011 at 7:48 PM, Kevin thebachel...@gmail.com wrote:
Currently I'm using a client (Pelops) to insert UUIDs (both lexical and
time) in to Cassandra. I haven't yet implemented a facility to remove them
with
On Sun, Jun 5, 2011 at 9:38 AM, Timo Nentwig timo.nent...@toptarif.de wrote:
Hmm, worked-around that by setting -Dcassandra.config (hmm, the client needs
the server's config...?).
Yes, this is fixed for 0.8.1.
Not very verbose :-\ May have something to do with my l/p being just / for
On Sun, 2011-06-05 at 00:51 -0400, Jeffrey Kesselman wrote:
Is CQL really the path for the future for Cassandra?
CQL is no more or less official than the Thrift interface, and TTBMK,
there is no secret cabal that met to decide it would be The Way. People
will use what works best for them, and
Hi Maki and Adrian,
Thank you very much for the promptness. It's weekend after all :).
I realized I forgot a part of my question until Adrian mentioned the
replication factor. Is it also possible to set where the replicas are
stored as well? Thanks.
This is a research experiment we're exploring
Jonathan, I've upgraded to 0.8.0 and the problem got worse. Now, I can't
delete any rows from the CLI, regardless of the type they're stored as.
-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Sunday, June 05, 2011 10:56 AM
To: user@cassandra.apache.org
On Sun, Jun 5, 2011 at 12:18 PM, Khanh Nguyen nguyen.h.kh...@gmail.com wrote:
Hi Maki and Adrian,
Thank you very much for the promptness. It's weekend after all :).
I realized I forgot a part of my question until Adrian mentioned the
replication factor. Is it also possible to set where the
Please give more detailed info about what exactly you are worried about or
trying to solve.
Please take a step back and look at cassandra's architecture again and what
it's trying to solve. It's a distributed database so if you do what you are
describing there is a potential of getting hotspots.
What is the best practices here to page and slice columns from a row.
So lets say I have 1,000,000 columns in a row
I read the row but want to have 1 thread read columns 0 - , second
thread (actor in my case) 1 - 1 ... and so on so i can have 100
workers processing 10,000 columns for
Great. Thank you, Eric.
-k
On Sun, Jun 5, 2011 at 2:13 PM, Eric tamme eta...@gmail.com wrote:
On Sun, Jun 5, 2011 at 12:18 PM, Khanh Nguyen nguyen.h.kh...@gmail.com
wrote:
Hi Maki and Adrian,
Thank you very much for the promptness. It's weekend after all :).
I realized I forgot a part of
On Sun, Jun 5, 2011 at 2:17 PM, mcasandra mohitanch...@gmail.com wrote:
Please give more detailed info about what exactly you are worried about or
trying to solve.
In general, we are trying to devise a partitioning and replication
scheme that takes into account social relations between data.
You're going to need to get a lot more specific.
On Sun, Jun 5, 2011 at 12:12 PM, Kevin thebachel...@gmail.com wrote:
Jonathan, I've upgraded to 0.8.0 and the problem got worse. Now, I can't
delete any rows from the CLI, regardless of the type they're stored as.
-Original Message-
If you need to parallelize (and scale) you need to distribute across
multiple rows. One Big Row means all your 100 workers are hammering
the same 3 (for instance) replicas at the same time.
On Sun, Jun 5, 2011 at 1:43 PM, Joseph Stein crypt...@gmail.com wrote:
What is the best practices here to
So I can have one PagedIndex CF that holdes a row for each data file I am
processing.
The columns for that row (in my example) would have X columns and I can make
those columns values be 100 strings that represent keys in another PagedData
CF
This other PagedData CF for each row would have
It would be definetely useful to be able to have columns (or super columns)
names WITHOUT their values. If these ones are pretty big or if there are a
lot of columns, that would generate traffic not necessarily needed (if in
the end you are just interrested by some column).
Moreover it doesn't
I did a insertion test with and without secondary indexes, and found that:
Without secondary index: ~10864 rows inserted per second
With secondary index on one column(BytesType): ~1515 rows inserted per
second
Is this normal? why secondary index would have so much affect?
I noticed that If I
Ops, I misread 150 GB in one of your earlier emails as 150 MB so forget
what I said before. You have loads of free space :)
How many files do you have in your data directory ? If it's 1 then that log
message was a small bug, that has been fixed.
Cheers
-
Aaron Morton
From what I've seen of CQL there is no comparison between the potential
complexity of a CQL statement and that of a SQL statement. IMHO CQL is more or
less a human readable form of the current API, it does not add features. SQL
statements are arbitrarily complex and may generate many possible
Khanh Nguyen wrote:
Is there a way to tell where a piece of data is stored in a cluster?
For example, can I tell if LastNameColumn['A'] is stored at node 1 in
the ring?
I have not used it but you can see getNaturalEndpoints in jmx. It will tell
you which nodes are responsible for a given
Fair enough.
I do have to keep reminding myself that a REST interface requires text.
And it does make more sense, at least, when coming from a human as
opposed to when you make a computer spend cycles converting binary to
text just so another computer can spend cycles turning it back again.
On
It may not what you want, but please read about Network Topology Strategy and
DC_QUORUM.
http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
You can configure your Cassandra Data Center aware . Your read and write will
be resolved in DC local, but will be
getNaturalEndpoints tells you which key will be stored on which nodes,
but we can't force cassandra to store given key to specific nodes.
maki
2011/6/6 mcasandra mohitanch...@gmail.com:
Khanh Nguyen wrote:
Is there a way to tell where a piece of data is stored in a cluster?
For example, can
Index updates require read-before-write (to find out what the prior
version was, if any, and update the index accordingly). This is
random i/o.
Index creation on the other hand is a lot of sequential i/o, hence
more efficient.
So, the classic bulk load advice to ingest data prior to creating
On Sun, Jun 5, 2011 at 11:26 PM, Maki Watanabe watanabe.m...@gmail.com wrote:
getNaturalEndpoints tells you which key will be stored on which nodes,
but we can't force cassandra to store given key to specific nodes.
maki
I'm confused. Didn't you mention previously that I can use
0.8 under load may turn out to be more stable and well behaving than any
release so far
Been doing a few test runs stuffing more than 1 billion records into a 12
node cluster and thing looks better than ever.
VM's stable and nice at 11GB. No data corruptions, dead nodes, full GC's or
any of the
31 matches
Mail list logo