This does look like a very viable solution. Thanks.

Could you give us some pointers/documentation on :
 - how can we build such SSTables using spark jobs, maybe
https://github.com/Netflix/sstable-adaptor ?
 - how do we send these tables to cassandra? does a simple SCP work?
 - what is the recommended size for sstables for when it does not fit a
single executor

On 5 February 2018 at 18:40, Romain Hardouin <romainh...@yahoo.fr.invalid>
wrote:

> Hi Julien,
>
> We have such a use case on some clusters. If you want to insert big
> batches at fast pace the only viable solution is to generate SSTables on
> Spark side and stream them to C*. Last time we benchmarked such a job we
> achieved 1.3 million partitions inserted per seconde on a 3 C* nodes test
> cluster - which is impossible with regular inserts.
>
> Best,
>
> Romain
>
> Le lundi 5 février 2018 à 03:54:09 UTC+1, kurt greaves <
> k...@instaclustr.com> a écrit :
>
>
> Would you know if there is evidence that inserting skinny rows in sorted
> order (no batching) helps C*?
>
> This won't have any effect as each insert will be handled separately by
> the coordinator (or a different coordinator, even). Sorting is also very
> unlikely to help even if you did batch.
>
>  Also, in the case of wide rows, is there evidence that sorting clustering
> keys within partition batches helps ease C*'s job?
>
> No evidence, seems very unlikely.
> ​
>



-- 
Julien MOUMNÉ
Software Engineering - Data Science
Mail: jmou...@deezer.com
12 rue d'Athènes 75009 Paris - France

Reply via email to