This does look like a very viable solution. Thanks. Could you give us some pointers/documentation on : - how can we build such SSTables using spark jobs, maybe https://github.com/Netflix/sstable-adaptor ? - how do we send these tables to cassandra? does a simple SCP work? - what is the recommended size for sstables for when it does not fit a single executor
On 5 February 2018 at 18:40, Romain Hardouin <romainh...@yahoo.fr.invalid> wrote: > Hi Julien, > > We have such a use case on some clusters. If you want to insert big > batches at fast pace the only viable solution is to generate SSTables on > Spark side and stream them to C*. Last time we benchmarked such a job we > achieved 1.3 million partitions inserted per seconde on a 3 C* nodes test > cluster - which is impossible with regular inserts. > > Best, > > Romain > > Le lundi 5 février 2018 à 03:54:09 UTC+1, kurt greaves < > k...@instaclustr.com> a écrit : > > > Would you know if there is evidence that inserting skinny rows in sorted > order (no batching) helps C*? > > This won't have any effect as each insert will be handled separately by > the coordinator (or a different coordinator, even). Sorting is also very > unlikely to help even if you did batch. > > Also, in the case of wide rows, is there evidence that sorting clustering > keys within partition batches helps ease C*'s job? > > No evidence, seems very unlikely. > > -- Julien MOUMNÉ Software Engineering - Data Science Mail: jmou...@deezer.com 12 rue d'Athènes 75009 Paris - France