Partition key with 300K rows can it be queried and distributed using Spark

Goutham reddy Thu, 17 Jan 2019 12:15:55 -0800

Hi,
As each partition key can hold up to 2 Billion rows, even then it is an
anti-pattern to have such huge data set for one partition key in our case
it is 300k rows only, but when trying to query for one particular key we
are getting timeout exception. If I use Spark to get the 300k rows for a
particular key does it solve the problem of timeouts and distribute the
data across the spark nodes or will it still throw timeout exceptions. Can
you please help me with the best practice to retrieve the data for the key
with 300k rows. Any help is highly appreciated.


Regards
Goutham.

Partition key with 300K rows can it be queried and distributed using Spark

Reply via email to