Any Other Options other than Spark IN Query

2021-06-22 Thread ranju goel
Hi All, Please suggest what are the other possible options in Spark other than IN Queries for fetching the data from db. If I am executing IN Query , all data fetched to single executor in single partition and load does not distribute to other executors. Please suggest are there other possibilit

RepartitionByCassandraReplica API Support on K8s

2021-06-04 Thread ranju goel
Hi All, I am running Spark 3.0.1 on Kubernetes where Spark fetching data from Cassandra and stores it in a JavaRDD. My Question is Does RDD JavaFunctions *repartitionByCassandraReplica *works on Kubernetes environment. I can get the result if I am using it in case of Spark Stand Alone on Virtuali

Re: Dynamic Allocation Backlog Property in Spark on Kubernetes

2021-04-10 Thread ranju goel
helping and the jobs will be finished faster. > > Best Regards, > Attila > > > On Sat, Apr 10, 2021 at 7:01 PM ranju goel wrote: > >> Hi Attila, >> >> >> I understood what you mean that Use the extra resources if available for >> running spark job,

Re: Dynamic Allocation Backlog Property in Spark on Kubernetes

2021-04-10 Thread ranju goel
Hi Attila, I understood what you mean that Use the extra resources if available for running spark job, using schedulerbacklogtimeout (dynamic allocation). This will speeds up the job. But if there are no extra resources available, then go for static allocation rather dynamic. Is it correct ? P

Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread ranju goel
Hi Attila, I will check why INVALID is getting appended in mailing address. What is your use case here? Client Driver Application not using collect but internally calling python script which is reading part files records [comma separated string] of each cluster separately and copying record