Jeremy,
We use cloudera distribution for our hadoop cluster and may not be possible
to migrate to brisk quickly because of flume/hue dependencies. Did you
successfully pull the data from independent cassandra cluster and dump into
completely disconnected hadoop cluster? It will be really helpful i
I was under the impression that Opscenter was only compatible with the DataStax
version of Cassandra.
I'll give that a shot :)
Thank you.
On 2011-12-23, at 6:49 PM, Jeremy Hanna wrote:
> One way to get a good bird's eye view of the cluster would be to install
> DataStax Opscenter - the commu
One way to get a good bird's eye view of the cluster would be to install
DataStax Opscenter - the community edition is free. You can do a lot of checks
from a web interface that are based on the jmx hooks that are in Cassandra. We
use it and it's helped us a lot. Hope it helps for what you're
I just imported a lot of data in a 9 node Cassandra cluster and before I create
a new ColumnFamily with even more data, I'd like to be able to determine how
full my cluster currently is (in terms of memory usage). I'm not too sure what
I need to look at. I don't want to import another 20-30GB of
We currently have cassandra nodes co-located with hadoop nodes and do a lot of
data analytics with it. We've looked at brisk - brisk still open-source and
available but datastax is putting its resources in a closed version of brisk as
part of datastax enterprise. We'll likely be moving to that
Have you tried Brisk?
On Dec 23, 2011, at 9:30 AM, "Jeremy Hanna" wrote:
> We do this all the time. Take a look at
> http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use
> mapreduce or pig to get data out of cassandra. If it's going to a separate
> hadoop cluster,
We do this all the time. Take a look at
http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use
mapreduce or pig to get data out of cassandra. If it's going to a separate
hadoop cluster, I don't think you'd need to co-locate task trackers or data
nodes on your cassandra
cassandra read performance depends on your disk cache (free memory at
node not used by cassandra) and disk IOPS peformance. In ideal case (no
need to merge sstables) cassandra needs 2 IOPS per data read if
cassandra key/row caches are not used.
Standard hard drive has about 150 IOPS. If you ha
I'm not sure this is much help, but we actually run Hadoop jobs to load and
extract data to and from HDFS. You can use ColumnFamilyInputFormat to race
over the data in Cassandra and output it to a file. That doesn't solve the
continuous problem, but should give you a batch mechanism to refresh th
Hello All,
I have a situation to dump cassandra data to hadoop cluster for further
analytics. Lot of other relevant data which is not present in cassandra is
already available in hdfs for analysis. Both are independent clusters right
now.
Is there a suggested way to get the data periodically or co
Peter,
Thanks for your response. I'm looking into some of the ideas in your
other recent mail, but I had another followup question on this one...
Is there any way to control the CPU load when using the "stress" benchmark?
I have some control over that with our home-grown benchmark, but I
thought
Thanks for your quick response!
I am currently running the performance tests with extended gc logging. I will
post the gc logging if clients time out at the same moment that the full
garbage collect runs.
Thanks
Rene
-Original Message-
From: sc...@scode.org [mailto:sc...@scode.org] On
No problems.
IMHO you should develop a sizable bruise banging your head against a using
Standard CF's and the Random Partitioner before using something else.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 23/12/2011, at 6:29 AM, Bryce A
Next time I will finish my morning coffee first :)
A
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 23/12/2011, at 5:08 AM, Peter Schuller wrote:
>> One other thing to consider is are you creating a few very large rows ? You
>> can check the min,
Counters only update the value of the column, they cannot be used as column
names. So you cannot have a dynamically updating top ten list using counters.
You have a couple of options. First use something like redis if that fits your
use case. Redis could either be the database of record for the
Hi, I'm doing stress test with tools/stress (java version)
I used 3 EC2 Xlarge with 4 raid instance storage for cluster.
I get write TPS min 4000 & up to 1
but I only get 50 for read TPS.
is this right? what am I doing wrong?
these are option.
java -jar stress.jar --nodes=ip1,ip2,ip3 --consist
16 matches
Mail list logo