Hi All,
Please suggest what are the other possible options in Spark other than IN
Queries for fetching the data from db.
If I am executing IN Query , all data fetched to single executor in single
partition and load does not distribute to other executors.
Please suggest are there other possibilit
Hi All,
I am running Spark 3.0.1 on Kubernetes where Spark fetching data from
Cassandra and stores it in a JavaRDD.
My Question is Does RDD JavaFunctions *repartitionByCassandraReplica *works
on Kubernetes environment. I can get the result if I am using it in case of
Spark Stand Alone on Virtuali
helping and the jobs will be finished faster.
>
> Best Regards,
> Attila
>
>
> On Sat, Apr 10, 2021 at 7:01 PM ranju goel wrote:
>
>> Hi Attila,
>>
>>
>> I understood what you mean that Use the extra resources if available for
>> running spark job,
Hi Attila,
I understood what you mean that Use the extra resources if available for
running spark job, using schedulerbacklogtimeout (dynamic allocation).
This will speeds up the job. But if there are no extra resources available,
then go for static allocation rather dynamic. Is it correct ?
P
Hi Attila,
I will check why INVALID is getting appended in mailing address.
What is your use case here?
Client Driver Application not using collect but internally calling python
script which is reading part files records [comma separated string] of each
cluster separately and copying record