Hello everyone,

I'm trying to read data from a Phoenix Table using apache Spark. I actually use the suggested method: sc.phoenixTableAsRDD without issuing any query (e.g. reading the whole table) and I noticed that the number of partitions that spark creates is equal to the number of regionServers. Is there a way to use a custom number of regions?

The problem we actually face is that if a region is bigger than the available memory of the spark executor, it goes in OOM. Being able to tune the number of regions, we might use a higher number of partitions reducing the memory footprint of the processing (and also slowing it down, i know :( ).

Thank you in advance

#A.M.

Reply via email to