I don't have directly relevant advice, especially WRT getting a meaningful
and coherent subset of your production data - that's probably too closely
coupled with your business logic.  Perhaps you can run a testing cluster
with a default TTL on all your tables of ~2 weeks, feeding it with real
production data so that you have a rolling current snapshot of production.

We do this basic strategy to support integration tests with the rest of our
platform.  We have a data access service with other internal teams acting
as customers of that data.  But it's hard to write strong tests against
this, because it becomes challenging to predict the values which you should
expect to get back without rewriting the business logic directly into your
tests (and then what exactly are you testing, are you testing your tests?)

But our data interaction layer tests all focus around inserting the data
under test immediately before the assertions portion of the given test.  We
use Specs2 as a testing framework, and that gives us access to a very nice
"eventually { ... }" syntax which will retry the assertions portion several
times with a backoff (so that we can account for the eventually consistent
nature of Cassandra, and reduce the number of false failures without having
to do test execution speed impacting operations like sleep before assert).

Basically our data access layer unit tests are strong and rely only on
synthetic data (assert that the response is exact for every value), while
integration tests from other systems use much softer tests against real
data (more like is there data, and does that data seem to be the right
format and for the right time range).

On Mon, Jan 26, 2015 at 3:26 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hi guys,
>
> We currently use a CI with tests based on docker containers.
>
> We have a C* service "dockerized". Yet we have an issue since we would
> like 2 things, hard to achieve:
>
> - A fix data set to have predictable and determinist tests (that we can
> repeat at any time with the same result)
> - A recent data set to perform smoke testing on things services that need
> "recent data" (max 1 week old data)
>
> As our dataset is very big and data is not sorted by dates in SSTable, it
> is hard to have a coherent extract of the production data. Does anyone of
> you achieve to have something like this ?
>
> For "static" data, we could write queries by hand but I find it more
> relevant to have a real production extract. Regarding dynamic data we need
> a process that we could repeat every day / week to update data and have
> something light enough to keep fastness in containers start.
>
> How do you guys do this kind of things ?
>
> FWIW we are migrating to 2.0.11 very soon so solutions might use 2.0
> features.
>
> Any idea is welcome and if you need more info, please ask.
>
> C*heers,
>
> Alain
>

Reply via email to