Re: How to speed up SELECT * query in Cassandra

2015-02-16 Thread mck
> Could you please share how much data you store on the cluster and what > is HW configuration of the nodes? These nodes are dedicated HW, 24 cpu and 50Gb ram. Each node has a few TBs of data (you don't want to go over this) in raid50 (we're migrating over to JBOD). Each c* node is running 2.0.

Re: How to speed up SELECT * query in Cassandra

2015-02-16 Thread Jiri Horky
Hi, thanks for the reference, I really appreciate that you shared your experience. Could you please share how much data you store on the cluster and what is HW configuration of the nodes? I am really impressed that you are able to read 100M records in ~4minutes on 4 nodes. It makes something like

Re: How to speed up SELECT * query in Cassandra

2015-02-14 Thread mck
Jirka, > But I am really interested how it can work well with Spark/Hadoop where > you basically needs to read all the data as well (as far as I understand > that). I can't give you any benchmarking between technologies (nor am i particularly interested in getting involved in such a discussion)

Re: How to speed up SELECT * query in Cassandra

2015-02-13 Thread Jens Rantil
ssandra is that >> the hard part spark or hadoop does, the shuffling, could be done out of the >> box with Cassandra, but no one takes advantage on that. What if a map / >> reduce job used a temporary CF in Cassandra to store intermediate results? >> >> From: use

Re: How to speed up SELECT * query in Cassandra

2015-02-12 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Thanks Jirka! From: user@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra Hi, here are some snippets of code in scala which should get you started. Jirka H. loop { lastRow => val query = last

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Jiri Horky
Hi, here are some snippets of code in scala which should get you started. Jirka H. loop {lastRow =>val query = lastRow match {case Some(row) => nextPageQuery(row, upperLimit)case None => initialQuery(lowerLimit)}session.execute(query).all} private def nextPageQuery(row: Row, upperLimit: String

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Jiri Horky
/ > reduce job used a temporary CF in Cassandra to store intermediate > results? > > From: user@cassandra.apache.org <mailto:user@cassandra.apache.org> > Subject: Re: How to speed up SELECT * query in Cassandra > > I use spark with cassandra, and yo

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Colin
f cassandra's distributed nature vs partitioning data >>>> > on hadoop makes spark on hdfs actually fasted than on cassandra. >>>> >>>> I am not sure about the current state of Spark support for Cassandra, but >>>> I guess if you create a m

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread DuyHai Doan
e still stored in HDFS, as it happens to hadoop, is this right? I think >> the problem with Spark + Cassandra or with Hadoop + Cassandra is that the >> hard part spark or hadoop does, the shuffling, could be done out of the box >> with Cassandra, but no one takes advantage on that. What

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Colin
h Hadoop + Cassandra is that the hard >> part spark or hadoop does, the shuffling, could be done out of the box with >> Cassandra, but no one takes advantage on that. What if a map / reduce job >> used a temporary CF in Cassandra to store intermediate results? >> >> F

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Ja Sam
Your answer looks very promising How do you calculate start and stop? On Wed, Feb 11, 2015 at 12:09 PM, Jiri Horky wrote: > The fastest way I am aware of is to do the queries in parallel to > multiple cassandra nodes and make sure that you only ask them for keys > they are responsible for. Oth

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread DuyHai Doan
ght? I think > the problem with Spark + Cassandra or with Hadoop + Cassandra is that the > hard part spark or hadoop does, the shuffling, could be done out of the box > with Cassandra, but no one takes advantage on that. What if a map / reduce > job used a temporary CF in Cassandra t

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Marcelo Valle (BLOOMBERG/ LONDON)
o one takes advantage on that. What if a map / reduce job used a temporary CF in Cassandra to store intermediate results? From: user@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra I use spark with cassandra, and you dont need DSE. I see a lot of people ask this same

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Jiri Horky
The fastest way I am aware of is to do the queries in parallel to multiple cassandra nodes and make sure that you only ask them for keys they are responsible for. Otherwise, the node needs to resend your query which is much slower and creates unnecessary objects (and thus GC pressure). You can man

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Colin
I use spark with cassandra, and you dont need DSE. I see a lot of people ask this same question below (how do I get a lot of data out of cassandra?), and my question is always, why arent you updating both places at once? For example, we use hadoop and cassandra in conjunction with each other, w

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Jens Rantil
On Wed, Feb 11, 2015 at 11:40 AM, Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > If you use Cassandra enterprise, you can use hive, AFAIK. Even better, you can use Spark/Shark with DSE. Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se

How to speed up SELECT * query in Cassandra

2015-02-11 Thread Ja Sam
Is there a simple way (or even a complicated one) how can I speed up SELECT * FROM [table] query? I need to get all rows form one table every day. I split tables, and create one for each day, but still query is quite slow (200 millions of records) I was thinking about run this query in parallel, b