> Could you please share how much data you store on the cluster and what
> is HW configuration of the nodes?
These nodes are dedicated HW, 24 cpu and 50Gb ram.
Each node has a few TBs of data (you don't want to go over this) in
raid50 (we're migrating over to JBOD).
Each c* node is running 2.0.
Hi,
thanks for the reference, I really appreciate that you shared your
experience.
Could you please share how much data you store on the cluster and what
is HW configuration of the nodes? I am really impressed that you are
able to read 100M records in ~4minutes on 4 nodes. It makes something
like
Jirka,
> But I am really interested how it can work well with Spark/Hadoop where
> you basically needs to read all the data as well (as far as I understand
> that).
I can't give you any benchmarking between technologies (nor am i
particularly interested in getting involved in such a discussion)
ssandra is that
>> the hard part spark or hadoop does, the shuffling, could be done out of the
>> box with Cassandra, but no one takes advantage on that. What if a map /
>> reduce job used a temporary CF in Cassandra to store intermediate results?
>>
>> From: use
Thanks Jirka!
From: user@cassandra.apache.org
Subject: Re: How to speed up SELECT * query in Cassandra
Hi,
here are some snippets of code in scala which should get you started.
Jirka H.
loop { lastRow => val query = last
Hi,
here are some snippets of code in scala which should get you started.
Jirka H.
loop {lastRow =>val query = lastRow match {case Some(row) =>
nextPageQuery(row, upperLimit)case None =>
initialQuery(lowerLimit)}session.execute(query).all}
private def nextPageQuery(row: Row, upperLimit: String
/
> reduce job used a temporary CF in Cassandra to store intermediate
> results?
>
> From: user@cassandra.apache.org <mailto:user@cassandra.apache.org>
> Subject: Re: How to speed up SELECT * query in Cassandra
>
> I use spark with cassandra, and yo
f cassandra's distributed nature vs partitioning data
>>>> > on hadoop makes spark on hdfs actually fasted than on cassandra.
>>>>
>>>> I am not sure about the current state of Spark support for Cassandra, but
>>>> I guess if you create a m
e still stored in HDFS, as it happens to hadoop, is this right? I think
>> the problem with Spark + Cassandra or with Hadoop + Cassandra is that the
>> hard part spark or hadoop does, the shuffling, could be done out of the box
>> with Cassandra, but no one takes advantage on that. What
h Hadoop + Cassandra is that the hard
>> part spark or hadoop does, the shuffling, could be done out of the box with
>> Cassandra, but no one takes advantage on that. What if a map / reduce job
>> used a temporary CF in Cassandra to store intermediate results?
>>
>> F
Your answer looks very promising
How do you calculate start and stop?
On Wed, Feb 11, 2015 at 12:09 PM, Jiri Horky wrote:
> The fastest way I am aware of is to do the queries in parallel to
> multiple cassandra nodes and make sure that you only ask them for keys
> they are responsible for. Oth
ght? I think
> the problem with Spark + Cassandra or with Hadoop + Cassandra is that the
> hard part spark or hadoop does, the shuffling, could be done out of the box
> with Cassandra, but no one takes advantage on that. What if a map / reduce
> job used a temporary CF in Cassandra t
o one takes advantage on that. What if a map / reduce job used
a temporary CF in Cassandra to store intermediate results?
From: user@cassandra.apache.org
Subject: Re: How to speed up SELECT * query in Cassandra
I use spark with cassandra, and you dont need DSE.
I see a lot of people ask this same
The fastest way I am aware of is to do the queries in parallel to
multiple cassandra nodes and make sure that you only ask them for keys
they are responsible for. Otherwise, the node needs to resend your query
which is much slower and creates unnecessary objects (and thus GC pressure).
You can man
I use spark with cassandra, and you dont need DSE.
I see a lot of people ask this same question below (how do I get a lot of data
out of cassandra?), and my question is always, why arent you updating both
places at once?
For example, we use hadoop and cassandra in conjunction with each other, w
On Wed, Feb 11, 2015 at 11:40 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:
> If you use Cassandra enterprise, you can use hive, AFAIK.
Even better, you can use Spark/Shark with DSE.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Email: jens.ran...@tink.se
Is there a simple way (or even a complicated one) how can I speed up SELECT
* FROM [table] query?
I need to get all rows form one table every day. I split tables, and create
one for each day, but still query is quite slow (200 millions of records)
I was thinking about run this query in parallel, b
17 matches
Mail list logo