Hi Bigtopers, I am over here asking questions as I am a bit lost right now. Over in Gora we have a real nice test suite called Goraci [0]. Basically the test runs many ingest clients that continually create linked lists containing 25 million nodes. At some point the clients are stopped and a map reduce job is run to ensure no linked list has a hole. A hole indicates data was lost. Gennerally speaking the more nodes in the cluster, the better a chance there is of us finding that data is lost.
Now for part two... in Gora we currently have datastore implementations for Accumulo, Avro, Cassandra, HBase and Amazon Dynamodb... what we don not have, is a mechanism to run the ingestion test against each datastore as a controlled job meaning that we can subsequently gather metrics and infer behaviour across Gora datastores. I have not gone to our friends @Infra yet as I would rather do my homework first and exhaust the avenues where I could contribute to getting this off of the ground. My questions are therefore very very simple... does anyone have an idea about how we can get this working in tandem? Is this prime territory for Bigtop? Thanks very much in advance. Best Lewis [0] https://github.com/keith-turner/goraci -- *Lewis*
