You might find this interesting: https://medium.com/@foundev/synthetic-sharding-in-cassandra-to-deal-with-large-partitions-2124b2fd788b
Cheers, Stefano On Mon, Sep 18, 2017 at 5:07 AM, Adam Smith <adamsmith8...@gmail.com> wrote: > Dear community, > > I have a table with inlinks to URLs, i.e. many URLs point to > http://google.com, less URLs point to http://somesmallweb.page. > > It has very wide and very skinny rows - the distribution is following a > power law. I do not know a priori how many columns a row has. Also, I can't > identify a schema to introduce a good partitioning. > > Currently, I am thinking about introducing splits by: pk is like (URL, > splitnumber), where splitnumber is initially 1 and hash URL mod > splitnumber would determine the splitnumber on insert. I would need a > separate table to maintain the splitnumber and a spark-cassandra-connector > job counts the columns and and increases/doubles the number of splits on > demand. This means then that I would have to move e.g. (URL1,0) -> (URL1,1) > when splitnumber would be 2. > > Would you do the same? Is there a better way? > > Thanks! > Adam >