Re: Question about EC2 and SSDs

2014-09-05 Thread William Oberman
Theory aside, I switched from "RAID of ephemerals" for data, and root volume for write log to single EBS-based SSD without any noticeable impact on performance. will On Thu, Sep 4, 2014 at 9:35 PM, Steve Robenalt wrote: > Yes, I am aware there are no heads on an SSD. I also have seen plenty of

Re: VPC AWS

2014-06-05 Thread William Oberman
IPs) or private only. > > Any insight regarding snitches ? What snitch do you guys use ? > > > 2014-06-05 15:06 GMT+02:00 William Oberman >: > >> I don't think traffic will flow between "classic" ec2 and vpc directly. >> There is some kind of gateway

Re: VPC AWS

2014-06-05 Thread William Oberman
I don't think traffic will flow between "classic" ec2 and vpc directly. There is some kind of gateway bridge instance that sits between, acting as a NAT. I would think that would cause new challenges for: -transitions -clients Sorry this response isn't heavy on content! I'm curious how this thr

alternative vnode upgrade strategy?

2014-05-28 Thread William Oberman
I'm concerned about the bad reports of using shuffle to do a vnode upgrade (and I did a "smoke test" trying shuffle a test cluster, and had out of disk space issues). I then started to plan out the "dual DC" upgrade path, but I wonder if this option is easier: Starting point: N node cluster, no v

NTS, vnodes and 0% chance of data loss

2014-05-15 Thread William Oberman
I found this: http://mail-archives.apache.org/mod_mbox/cassandra-user/201404.mbox/%3ccaeduwd1erq-1m-kfj6ubzsbeser8dwh+g-kgdpstnbgqsqc...@mail.gmail.com%3E I read the three referenced cases. In addition, case 4123 references: http://www.mail-archive.com/dev@cassandra.apache.org/msg03844.html And

Re: NTS, vnodes and 0% chance of data loss

2014-05-14 Thread William Oberman
the source for NTS), as NTS does skip over the same rack (though, it will allow multiple in the same rack if you "fill up"... I guess if someone did DC:4 with 3 racks they'll always get one rack with two copies of the data, for example). will On Tue, May 13, 2014 at 1:41 PM, Will

Re: clearing tombstones?

2014-05-11 Thread William Oberman
4, Ruchir Jha wrote: > I tried to do this, however the doubling in disk space is not "temporary" > as you state in your note. What am I missing? > > > On Fri, Apr 11, 2014 at 10:44 AM, William Oberman < > ober...@civicscience.com > > wrote: > > So, if I

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
operly. If not - >> that's the reason. >> >> Kind regards, >> Michał Michalski, >> michal.michal...@boxever.com >> >> >> On 14 April 2014 15:04, William Oberman wrote: >> >>> I didn't cross link my thread, but the basic idea is

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
lski, > michal.michal...@boxever.com > > > On 14 April 2014 14:44, William Oberman wrote: > >> I had a thread on this forum about clearing junk from a CF. In my case, >> it's ~90% of ~1 billion rows. >> >> One side effect I had hoped for was a reduction i

bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
I had a thread on this forum about clearing junk from a CF. In my case, it's ~90% of ~1 billion rows. One side effect I had hoped for was a reduction in the size of the bloom filter. But, according to nodetool cfstats, it's still fairly large (~1.5GB of RAM). Do bloom filters ever resize themse

Re: clearing tombstones?

2014-04-14 Thread William Oberman
you > decide to take and the results that would be great. > > > Mark > > > On Fri, Apr 11, 2014 at 5:53 PM, William Oberman > wrote: > >> I've learned a *lot* from this thread. My thanks to all of the >> contributors! >> >> Paulo: Good luck w

Re: clearing tombstones?

2014-04-11 Thread William Oberman
o compact these smaller SSTables. For all these >> reasons it is generally advised to stay away from running compactions >> manually. >> >> Assuming that this is a production environment and you want to keep >> everything running as smoothly as possible I would reduce the gc_grace o

Re: clearing tombstones?

2014-04-11 Thread William Oberman
ant to keep > everything running as smoothly as possible I would reduce the gc_grace on > the CF, allow automatic minor compactions to kick in and then increase the > gc_grace once again after the tombstones have been removed. > > > On Fri, Apr 11, 2014 at 3:44 PM, William Oberman >

Re: clearing tombstones?

2014-04-11 Thread William Oberman
impact to minor compactions). I'm hesitant to write the offending sentence again :-) On Fri, Apr 11, 2014 at 10:44 AM, William Oberman wrote: > So, if I was impatient and just "wanted to make this happen now", I could: > > 1.) Change GCGraceSeconds of the CF to 0 > 2.) r

Re: clearing tombstones?

2014-04-11 Thread William Oberman
allows a great deal of > time for consistency to be achieved prior to deletion. If you are > operationally confident that you can achieve consistency via anti-entropy > repairs within a shorter period you can always reduce that 10 day interval. > > > Mark > > > On Fri,

Re: clearing tombstones?

2014-04-11 Thread William Oberman
it never worked so I run > nodetool compaction on every node; that does it. > > > 2014-04-11 16:05 GMT+02:00 William Oberman : > > I'm wondering what will clear tombstoned rows? nodetool cleanup, nodetool >> repair, or time (as in just wait)? >> >> I had a CF

clearing tombstones?

2014-04-11 Thread William Oberman
I'm wondering what will clear tombstoned rows? nodetool cleanup, nodetool repair, or time (as in just wait)? I had a CF that was more or less storing session information. After some time, we decided that one piece of this information was pointless to track (and was 90%+ of the columns, and in 99

Re: using hadoop + cassandra for CF mutations (delete)

2014-04-08 Thread William Oberman
the row } } catch (Exception $e) { fwrite(STDERR, $e); } ==== On Fri, Apr 4, 2014 at 1:40 PM, William Oberman wrote: > Looking at the code, cassandra.input.split.size==Pig URL split_size, > right? But, in cassandra 1.2.15 I'm wondering if there is a bug that would &g

Re: using hadoop + cassandra for CF mutations (delete)

2014-04-04 Thread William Oberman
w exactly why, > probably because it hits the minimum number of rows per token. > > Another suggestion is to decrease the number of simultaneous mappers of > your job, so it doesn't hit cassandra too hard, and you'll get less > TimedOutExceptions, but your job will take long

using hadoop + cassandra for CF mutations (delete)

2014-04-04 Thread William Oberman
Hi, I have some history with cassandra + hadoop: 1.) Single DC + integrated hadoop = Was "ok" until I needed steady performance (the single DC was used in a production environment) 2.) Two DC's + integrated hadoop on 1 of 2 DCs = Was "ok" until my data grew and in AWS compute is expensive compared

Re: in AWS is it worth trying to talk to a server in the same zone as your client?

2014-02-12 Thread William Oberman
Same region, cross zone transfer is $0.01 / GB (see http://aws.amazon.com/ec2/pricing/, Data Transfer section). On Wed, Feb 12, 2014 at 3:04 PM, Russell Bradberry wrote: > Cross zone data transfer does not cost any extra money. > > LOCAL_QUORUM = QUORUM if all 6 servers are located in the same

dependencies for cassandra's pig integration?

2013-07-31 Thread William Oberman
I'm using AWS's EMR (hadoop as a service), and one step copies some data from EMR -> my cassandra cluster. I used to patch EMR with pig 0.11, but now AWS officially supports 0.11, so I thought I'd give it a try. I was having issues. The AWS forum on it is here: https://forums.aws.amazon.com/thre

cqlsh + existing cf's + query

2013-07-03 Thread William Oberman
I've been running cassandra a while, and have used the PHP api and cassandra-cli, but never gave cqlsh a shot. I'm not quite getting it. My most simple CF is a dumping ground for testing things created as: create column family stats; I was putting random stats I was computing in it. All keys, co

1.1.9 -> 1.1.11 rpm upgrade issue

2013-05-03 Thread William Oberman
I get this: Running rpm_check_debug ERROR with rpm_check_debug vs depsolve: apache-cassandra11 conflicts with apache-cassandra11-1.1.11-1.noarch I'm using Centos. Problem with my OS, or problem with the package? (And how can it conflict with itself??) will

Re: normal thread counts?

2013-05-01 Thread William Oberman
ndTcpConnectionPool constructor checking > to see if this could be the source of the leak. > > Cheers > >- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 1/05/2013, at 2:18 AM,

Re: normal thread counts?

2013-05-01 Thread William Oberman
; The > threads are created in the OutboundTcpConnectionPool constructor checking > to see if this could be the source of the leak. > > Cheers > > ----- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On

Re: normal thread counts?

2013-04-30 Thread William Oberman
> if your app is leaking connection you should probably deal with that first. > > Cheers > >- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 30/04/2013, at 3:07 AM, William Ober

normal thread counts?

2013-04-29 Thread William Oberman
Hi, I'm having some issues. I keep getting: ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[GossipStage:1,5,main] java.lang.OutOfMemoryError: unable to create new native thread -- after a day or two of runti

Re: StatusLogger format?

2013-04-15 Thread William Oberman
99% sure it's in bytes. On Mon, Apr 15, 2013 at 11:25 AM, William Oberman wrote: > Mainly the: > "ColumnFamilyMemtable ops,data" > section. > > Is data in bytes/kb/mb/etc? > > Example line: > StatusLogger.java (line 116) civicscience.sessions4963,1799916 > > Thanks! > > >

StatusLogger format?

2013-04-15 Thread William Oberman
Mainly the: "ColumnFamilyMemtable ops,data" section. Is data in bytes/kb/mb/etc? Example line: StatusLogger.java (line 116) civicscience.sessions4963,1799916 Thanks!

Re: how to stop out of control compactions?

2013-04-04 Thread William Oberman
pears I can't set min > 32 > > Why did you want to set it so high ? > If you want to disable compaction set it to 0. > > Cheers > > - > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thel

Re: how to stop out of control compactions?

2013-04-02 Thread William Oberman
just leave my compaction killers running instead (not that killing compactions constantly isn't messing with things as well). will On Tue, Apr 2, 2013 at 10:43 AM, William Oberman wrote: > Edward, you make a good point, and I do think am getting closer to having > to increase my clust

Re: how to stop out of control compactions?

2013-04-02 Thread William Oberman
ions are "out of > control" it usually means one of these things, > 1) you have a corrupt table that the compaction never finishes on, > sstables count keep growing > 2) you do not have enough hardware to handle your write load > > > On Tue, Apr 2, 2013 at 7:50 AM, Wil

Re: how to stop out of control compactions?

2013-04-02 Thread William Oberman
tcompactionthreshold > - Set the min and max > compaction thresholds for a given column family > > > > On Mon, Apr 1, 2013 at 12:38 PM, William Oberman > 'ober...@civicscience.com');> > > wrote: > >> I'll skip the prelude, but I worked myse

how to stop out of control compactions?

2013-04-01 Thread William Oberman
I'll skip the prelude, but I worked myself into a bit of a jam. I'm recovering now, but I want to double check if I'm thinking about things correct. Basically, I was in a state where a majority of my servers wanted to do compactions, and rather large ones. This was impacting my site performance.

odd timestamps

2013-03-05 Thread William Oberman
I happened to notice some bizarre timestamps coming out of the cassandra-cli. Example: [default@XXX] get CF[‘e2b753aa33b13e74e5e803d787b06000']; => (column=c35ef420-c37a-11e0-ac88-09b2f4397c6a, value=XXX, timestamp=2013042719) => (column=c3845ea0-c37a-11e0-8f6f-09b2f4397c6a, value=XXX, timestamp=2

Re: sstable2json had random behavior

2013-01-22 Thread William Oberman
d Index files. The problem goes away when I have all another > files (Compression, Filter...) > > > On Mon, Jan 21, 2013 at 11:36 AM, William Oberman < > ober...@civicscience.com> wrote: > >> I'm running 1.1.6 from the datastax repo. >> >> I

sstable2json had random behavior

2013-01-21 Thread William Oberman
I'm running 1.1.6 from the datastax repo. I ran sstable2json and got the following error: Exception in thread "main" java.io.IOError: java.io.IOException: dataSize of 7020023552240793698 starting at 993981393 would be larger than file /var/lib/cassandra/data/X-Data.db length 7502161255

Re: Cassandra at Amazon AWS

2013-01-17 Thread William Oberman
I have a "peer EBS disk" to the ephemeral disk . Then I do nodetool snapshot -> rsync from ephemeral to EBS -> take snapshot of EBS. Syncing nodetool snapshot directly to S3 would involve less steps and be cheaper (EBS costs more than S3), but I do post processing on the snapshot for EMR, and it

Re: AWS EMR <-> Cassandra

2013-01-16 Thread William Oberman
n > the cluster. This way, you would be able to correctly set the vars you need. > Out of curiosity, could you share what are you using for cassandra > storage? I am currently using EC2 local disks, but I am looking for an > alternative. > > Best regards, > Marcelo. > >

Re: AWS EMR <-> Cassandra

2013-01-04 Thread William Oberman
oner). I still want to know why the old easy way (of setting the 3 system variables on the pig starter box, and having the config flow into the task trackers) doesn't work! will On Fri, Jan 4, 2013 at 9:04 AM, William Oberman wrote: > On all tasktrackers, I see: > java.io.IOException

Re: AWS EMR <-> Cassandra

2013-01-04 Thread William Oberman
ess > isn't set (on the slave, the master is ok). > > Can you post the full error ? > > Cheers >- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 4/01/2013, at 11:15 AM

AWS EMR <-> Cassandra

2013-01-03 Thread William Oberman
Anyone ever try to read or write directly between EMR <-> Cassandra? I'm running various Cassandra resources in Ec2, so the "physical connection" part is pretty easy using security groups. But, I'm having some configuration issues. I have managed to get Cassandra + Hadoop working in the past usi

Re: remove DC

2012-11-13 Thread William Oberman
rote data directly to DC2, then you are correct you > don't need to run repair. > > You should just need to update the schema, and then decommission the node. > > -Jeremiah > > On Nov 12, 2012, at 2:25 PM, William Oberman > wrote: > > There is a great

remove DC

2012-11-12 Thread William Oberman
There is a great guide here on how to add resources: http://www.datastax.com/docs/1.1/operations/cluster_management#adding-capacity What about deleting resources? I'm thinking of removing a data center. Clearly I'd need to change strategy options, which is currently something like this: {DC1:3,DC

Re: hadoop consistency level

2012-10-18 Thread William Oberman
A recent thread made it sound like Brisk was no longer a datastax supported thing (it's DataStax Enterpise, or DSE, now): http://www.mail-archive.com/user@cassandra.apache.org/msg24921.html In particular this response: http://www.mail-archive.com/user@cassandra.apache.org/msg25061.html On Thu, Oc

Re: cassandra + pig

2012-10-11 Thread William Oberman
through them. I > don't recall if we did paging in pig or mapreduce but you should be able to > do that in both since pig allows you to specify the slice start. > > On Oct 11, 2012, at 11:28 AM, William Oberman > wrote: > > > If you don't mind me asking, how are you

Re: cassandra + pig

2012-10-11 Thread William Oberman
ort, it sounds like there are > some rough edges like you say. But issues that are reproducible on tickets > for any problems are much appreciated and they will get addressed. > > On Oct 11, 2012, at 10:43 AM, William Oberman > wrote: > > > I'm wondering how many peop

cassandra + pig

2012-10-11 Thread William Oberman
I'm wondering how many people are using cassandra + pig out there? I recently went through the effort of validating things at a much higher level than I previously did(*), and found a few issues: https://issues.apache.org/jira/browse/CASSANDRA-4748 https://issues.apache.org/jira/browse/CASSANDRA-4

Re: pig and widerows

2012-09-27 Thread William Oberman
going on in terms of the integration between cassandra/pig/hadoop. will On Thu, Sep 27, 2012 at 3:26 PM, William Oberman wrote: > The next painful lesson for me was figuring out how to get logging working > for a distributed hadoop process. In my test environment, I have a single >

Re: pig and widerows

2012-09-27 Thread William Oberman
oop cluster I'm going to try to undo all of my other hacks to get logging/printing working to confirm if those were actually the only two changes I had to make. will On Thu, Sep 27, 2012 at 1:43 PM, William Oberman wrote: > Ok, this is painful. The first problem I found is in sto

Re: pig and widerows

2012-09-27 Thread William Oberman
ging messages), make sure it's appears first on the pig classpath (use pig -secretDebugCmd to see the fully qualified command line). The next thing I'm trying to figure out is why when widerows == true I'm STILL not seeing more than 1024 columns :-( will On Wed, Sep 26, 2012 at 3:42 PM,

pig and widerows

2012-09-26 Thread William Oberman
Hi, I'm trying to figure out what's going on with my cassandra/hadoop/pig system. I created a "mini" copy of my main cassandra data by randomly subsampling to get ~50,000 keys. I was then writing pig scripts but also the equivalent operation using simple single threaded code to double check pig.

Re: new "nodetool ring" output and unbalanced ring?

2012-09-06 Thread William Oberman
from the range being > considered, not the last node that was chosen as a replica). > > To fix this, you'll either need to make the 1d node a 1c node, or make > 42535295865117307932921825928971026432 a 1d node so that you're alternating > racks within that DC. > >

new "nodetool ring" output and unbalanced ring?

2012-09-06 Thread William Oberman
Hi, I recently upgraded from 0.8.x to 1.1.x (through 1.0 briefly) and nodetool -ring seems to have changed from "owns" to "effectively owns". "Effectively owns" seems to account for replication factor (RF). I'm ok with all of this, yet I still can't figure out what's up with my cluster. I have

Re: Professional Support

2011-09-06 Thread William Oberman
I also have used datastax with great success (same disclaimer). A specific example: -I setup a one-on-one call to talk through an issue, in my case a server reconfiguration. It took 2 days to find a time to meet, though that was my fault as I believe they could have worked me in within a day. I

Re: cassandra 0.8.4 + pig (using cloudera rpms)

2011-09-05 Thread William Oberman
cluster and we're in the process of moving > to production. We're currently using pig from cdhu0. All we did was > replace the 0.8.4 jars after installing the debian packages for 0.8.4. > > Not sure if that helps anyone, but thought I would share what we've seen. > > Btw,

cassandra 0.8.4 + pig (using cloudera rpms)

2011-09-04 Thread William Oberman
I've had some troubles, so I thought I'd pass on my various bug fixes: -Cass 0.8.4 has troubles with pig/hadoop (you get NPE's when trying to connect to cassandra in the pig logs). You need this patch: http://svn.apache.org/viewvc?revision=1158940&view=revision And maybe this: http://svn.apache.

Re: how to migrate?

2011-08-25 Thread William Oberman
> > > create keyspace civicscience with replication_factor=3 and > strategy_options = [{us-east:3}] and > placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'; > > FYI the replication_factor property with the NTS is incorrect, the next(?) > revision of 0.8 will raise an error

how to migrate?

2011-08-24 Thread William Oberman
I was hoping to transition my "simple" cassandra cluster (where each node is a cassandra + hadoop tasktracker) to a cluster with two virtual datacenters (vanilla cassandra vs. cassandra + hadoop tasktracker), based on this: http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig The problem

Re: Survey: Cassandra/JVM Resident Set Size increase

2011-07-14 Thread William Oberman
I finally upgraded to 0.7.4 -> 0.8.0 (using riptano packages) 2 days ago. Before, my resident memory (for the java process) would slowly grow without bound and the OS would kill the process. But, over the last 2 days, I _think_ it's been stable. I'll let you know in a week :-) My other stats: AW

Re: What does a write lock ?

2011-07-08 Thread William Oberman
a Java class. > * > * > On Fri, Jul 8, 2011 at 11:13 AM, William Oberman > wrote: > >> I use a language specific wrapper around thrift as my "client", but yes, I >> guess I fundamentally mean thrift == client, and the cassandra server == >> server. >>

Re: What does a write lock ?

2011-07-08 Thread William Oberman
I use a language specific wrapper around thrift as my "client", but yes, I guess I fundamentally mean thrift == client, and the cassandra server == server. will On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman wrote: > I am confused by what you mean by "Cassandra client code." Is this part o

Re: What does a write lock ?

2011-07-08 Thread William Oberman
dra is using the database definition. will On Fri, Jul 8, 2011 at 10:35 AM, William Oberman wrote: > I think you need to look into Zookeeper, or other distributed coordinator, > as you have little/no guarantees from cassandra between 1-3 (in terms of the > guarantees you want and need). &g

Re: What does a write lock ?

2011-07-08 Thread William Oberman
lidation check, see? > > If Cassandra does not guard against this then one possible > solution would be to make my own key-to-mutex map in memory, lock the mutex > for A's key as a precursor to (1) and release it in a post-update function. > But I am always very nervous

Re: What does a write lock ?

2011-07-08 Thread William Oberman
else will confirm if I'm wrong yet again. For me, if I need two pieces of data to be consistently related to each other and stored in cassandra, I encode them (usually JSON) and store them in one column. will On Fri, Jul 8, 2011 at 8:30 AM, William Oberman wrote: > Questions like this see

Re: What does a write lock ?

2011-07-08 Thread William Oberman
Questions like this seem to come up a lot: http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily http://www.mail-archive.com/use

Re: Cassandra memory problem

2011-07-07 Thread William Oberman
I think I had (and have) a similar problem: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html My memory usage grew slowly until I ran out of mem and the OS killed my process (due to no swap). I'm still on 0.7.4, but I'm rolling

Re: cassandra/hadoop/pig

2011-07-06 Thread William Oberman
> > > On Wed, Jul 6, 2011 at 2:48 PM, William Oberman > wrote: > >> I have a few cassandra/hadoop/pig questions. I currently have things set >> up in a test environment, and for the most part everything works. But, >> before I start to roll things out to produc

cassandra/hadoop/pig

2011-07-06 Thread William Oberman
I have a few cassandra/hadoop/pig questions. I currently have things set up in a test environment, and for the most part everything works. But, before I start to roll things out to production, I wanted to check on/confirm some things. When I originally set things up, I used: http://wiki.apache.o

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread William Oberman
tally works. Sounds like you are hacking (or at least looking) at the source, so all the power to you if/when you try these kind of changes. will On Sun, Jul 3, 2011 at 8:45 PM, AJ wrote: > ** > On 7/3/2011 6:32 PM, William Oberman wrote: > > Was just going off of: " Send the

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread William Oberman
Was just going off of: "Send the value to the primary replica and send placeholder values to the other replicas". Sounded like you wanted to write the value to one, and write the placeholder to N-1 to me. But, C* will propagate the value to N-1 eventually anyways, 'cause that's just what it does

Re: Strong Consistency with ONE read/writes

2011-07-02 Thread William Oberman
Ok, I see the "you happen to choose the 'right' node" idea, but it sounds like you want to solve "C* problems" in the client, and they already wrote that complicated code to make clients simple. You're talking about reimplementing key<->node mappings, network topology (with failures), etc... Plu

Re: hadoop results

2011-06-30 Thread William Oberman
that is the current metric to use. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 30 Jun 2011, at 06:35, William Oberman wrote: > > > I'll start with my question: given

hadoop results

2011-06-29 Thread William Oberman
I'll start with my question: given a CF with comparator TimeUUIDType, what is the most efficient way to get the greatest column's value? Context: I've been running cassandra for a couple of months now, so obviously it's time to start layering more on top :-) In my test environment, I managed to g

Re: how to remove a "null" column

2011-06-24 Thread William Oberman
I think you have to do: assume counters comparator as bytes; del counters['EU'][0]; will On Fri, Jun 24, 2011 at 6:51 AM, Sasha Dolgy wrote: > I have implemented counters in a limited capacity to record the number > of 'hits' that are received from a given ISO country code. CH for > example,

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread William Oberman
I've been doing EBS snapshots for mysql for some time now, and was using a similar pattern as Josep (XFS with freeze, snap, unfreeze), with the extra complication that I was actually using 8 EBS's in RAID-0 (and the extra extra complication that I had to lock the MyISAM tables... glad to be moving

Re: rpm from 0.7.x -> 0.8?

2011-06-22 Thread William Oberman
> Doesn't matter. auto_bootstrap only applies to first start ever. > > On Wed, Jun 22, 2011 at 10:48 AM, William Oberman > wrote: > > I have a question about auto_bootstrap. When I originally brought up the > > cluser, I did: > > -seed with auto_boot = false >

Re: rpm from 0.7.x -> 0.8?

2011-06-22 Thread William Oberman
so that new clusters don't bootstrap immediately. You should turn this on when you start adding new nodes to a cluster that already has data on it. I'm not adding new nodes, but the cluster does have data on it... will On Wed, Jun 22, 2011 at 11:39 AM, William Oberman wrote: > I jus

Re: rpm from 0.7.x -> 0.8?

2011-06-22 Thread William Oberman
the version of nodetool will On Wed, Jun 22, 2011 at 10:15 AM, William Oberman wrote: > I'm running 0.7.4 from rpm (riptano). If I do a yum upgrade, it's trying > to do 0.7.6. To get 0.8.x I have to do "install apache-cassandra08". But > that is going to insta

rpm from 0.7.x -> 0.8?

2011-06-22 Thread William Oberman
I'm running 0.7.4 from rpm (riptano). If I do a yum upgrade, it's trying to do 0.7.6. To get 0.8.x I have to do "install apache-cassandra08". But that is going to install two copies. Is there a semi-official way of properly upgrading to 0.8 via rpm? -- Will Oberman Civic Science, Inc. 3030 Pe

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
unning on the > boxes? > > > > On Wed, Jun 22, 2011 at 9:06 AM, William Oberman > wrote: > >> I was wondering/I figured that /var/log/kern indicated the OS was killing >> java (versus an internal OOM). >> >> The nodetool repair is interesting. My applica

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
100% for days and days > before it was finally killed because 'apt' was fighting for resource. > At least, that's as far as I got in my investigation before giving up, > moving to 0.8.0 and implementing 24hr nodetool repair on each node via > cronjobso far ... no p

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
isn't super obvious to me at the moment... > > > > On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis > wrote: > >> The place to start is with the statistics Cassandra logs after each GC. > > look for GCInspector > > I found this in the logs on all my se

OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
I woke up this morning to all 4 of 4 of my cassandra instances reporting they were down in my cluster. I quickly started them all, and everything seems fine. I'm doing a postmortem now, but it appears they all OOM'd at roughly the same time, which was not reported in any cassandra log, but I disc

Re: Docs: Token Selection

2011-06-17 Thread William Oberman
I haven't done it yet, but when I researched how to make geo-diverse/failover DCs, I figured I'd have to do something like RF=6, strategy = {DC1=3, DC2=3}, and LOCAL_QUORUM for reads/writes. This gives you an "ack" after 2 local nodes do the read/write, but the data eventually gets distributed to

Re: prep for cassandra storage from pig

2011-06-15 Thread William Oberman
I'll do a reply all, to keep this more consistent (sorry!). Rather than staying stuck, I wrote a custom function: TupleToBagOfTuple. I'm curious if I could have avoided it with proper pig scripting though. On Wed, Jun 15, 2011 at 3:08 PM, William Oberman wrote: > My problem is the

Re: prep for cassandra storage from pig

2011-06-15 Thread William Oberman
ndraBag from > pygmalion - it does the work for you to get it back into a form that > cassandra understands. > > Others may know better how to massage the data into that form using just > pig, but if all else fails, you could write a udf to do that. > > Jeremy > > On Jun 1

prep for cassandra storage from pig

2011-06-15 Thread William Oberman
I think I'm stuck on typing issues trying to store data in cassandra. To verify, cassandra wants (key, {tuples}) My pig script is fairly brief: raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS (key:chararray, columns:bag {column:tuple (name, value)}); --colums == timeUUID -> J

hadoop/pig notes

2011-06-08 Thread William Oberman
I decided to try out hadoop/pig + cassandra. I had my ups and downs to get the script I wanted to run to work. I'm sure everyone who tries will have their own experiences/problems, but mine were: -Everything I need to know was in http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html an

Re: best way to backup

2011-04-30 Thread William Oberman
out the BigTable and original Facebook papers, > linked from the wiki > > <http://wiki.apache.org/cassandra/ArchitectureOverview>Aaron > > On 29 Apr 2011, at 23:43, William Oberman wrote: > > Dumb question, but referenced twice now: which files are the SSTables and >

Re: best way to backup

2011-04-29 Thread William Oberman
system from there without impacting the main data raid. > > But the main reason to do this is to have an 'omg we screwed up big time > and deleted / corrupted data' recovery. > > On Apr 28, 2011, at 9:53 PM, William Oberman wrote: > > Even with N-nodes for redundancy, I

Re: best way to backup

2011-04-28 Thread William Oberman
so copies a > json file with the current files in the directory, so you can know what to > restore in that event (as far as I understand). > > On Apr 28, 2011, at 2:53 PM, William Oberman wrote: > > > Even with N-nodes for redundancy, I still want to have backups. I'm an &

Re: best way to backup

2011-04-28 Thread William Oberman
at seems pointless anyways. will On Thu, Apr 28, 2011 at 3:57 PM, Sasha Dolgy wrote: > You could take a snapshot to an EBS volume. then, take a snapshot of that > via AWS. of course, this is ok.when they -arent- having outages and issues > ... > On Apr 28, 2011 9:54 PM, "William Ob

best way to backup

2011-04-28 Thread William Oberman
Even with N-nodes for redundancy, I still want to have backups. I'm an amazon person, so naturally I'm thinking S3. Reading over the docs, and messing with nodeutil, it looks like each new snapshot contains the previous snapshot as a subset (and I've read how cassandra uses hard links to avoid ex

nodetool hanging

2011-04-27 Thread William Oberman
I've figured this out, but to help those out there who don't want to waste an hour like me debugging a hung "nodetool ring" command: JMX opens a second random port, so you either have to disable any firewalls between the machine running nodetool and the cassandra instance (or there are complicated

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
> vaguely remember Ellis saying it's not a good idea to switch > NetworkTopologyStrategy ... > > On Wed, Apr 27, 2011 at 3:29 PM, William Oberman > wrote: > > Thanks Sasha. Fortunately/unfortunately I did realize the default & > current > > behavior of th

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
Route53 is already in the works (to route EC2 traffic to the closest region). will On Wed, Apr 27, 2011 at 9:33 AM, William Oberman wrote: > I don't think of it as migrating an instance, it's more of a destroy/start > with EC2. But, I still think it would be very useful to spin up

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
the ring, do your work, bootstrap > it back to the ring .. i think this could be avoided if cassandra > maintained hostname references and not just IP references for nodes. > > -sasha > > On Wed, Apr 27, 2011 at 2:56 PM, William Oberman > wrote: > > While I haven't

Re: advice for EC2 deployment

2011-04-27 Thread William Oberman
> We leverage cassandra instances in APAC, US & Europe ... so it's > important for us to know that we have one data center in each 'region' > and multiple racks per DC ... > > -sasha > > On Wed, Apr 27, 2011 at 3:06 PM, William Oberman > wrote: > &

  1   2   >