sc.phoenixTableAsRDD number of initial partitions

Antonio Murgia Thu, 13 Oct 2016 09:16:47 -0700

Hello everyone,

I'm trying to read data from a Phoenix Table using apache Spark. Iactually use the suggested method: sc.phoenixTableAsRDD without issuingany query (e.g. reading the whole table) and I noticed that the numberof partitions that spark creates is equal to the number ofregionServers. Is there a way to use a custom number of regions?

The problem we actually face is that if a region is bigger than theavailable memory of the spark executor, it goes in OOM. Being able totune the number of regions, we might use a higher number of partitionsreducing the memory footprint of the processing (and also slowing itdown, i know :( ).


Thank you in advance

#A.M.

sc.phoenixTableAsRDD number of initial partitions

Reply via email to