Re: Wide rows splitting

Stefano Ortolani Mon, 18 Sep 2017 01:06:08 -0700

You might find this interesting:
https://medium.com/@foundev/synthetic-sharding-in-cassandra-to-deal-with-large-partitions-2124b2fd788b


Cheers,
Stefano

On Mon, Sep 18, 2017 at 5:07 AM, Adam Smith <adamsmith8...@gmail.com> wrote:

> Dear community,
>
> I have a table with inlinks to URLs, i.e. many URLs point to
> http://google.com, less URLs point to http://somesmallweb.page.
>
> It has very wide and very skinny rows - the distribution is following a
> power law. I do not know a priori how many columns a row has. Also, I can't
> identify a schema to introduce a good partitioning.
>
> Currently, I am thinking about introducing splits by: pk is like (URL,
> splitnumber), where splitnumber is initially 1 and  hash URL mod
> splitnumber would determine the splitnumber on insert. I would need a
> separate table to maintain the splitnumber and a spark-cassandra-connector
> job counts the columns and and increases/doubles the number of splits on
> demand. This means then that I would have to move e.g. (URL1,0) -> (URL1,1)
> when splitnumber would be 2.
>
> Would you do the same? Is there a better way?
>
> Thanks!
> Adam
>

Re: Wide rows splitting

Reply via email to