unsubscribe
--
Best Regards,
- Huajian
Unsubscribe
Well your mileage varies so to speak.
- Spark itself is written in Scala. However, that does not imply you
should stick with Scala.
- I have used both for spark streaming and spark structured streaming,
they both work fine
- PySpark has become popular with the widespread use of iDa
Shay, You may find this video helpful (with some API code samples that
you are looking for).
https://www.youtube.com/watch?v=JNQu-226wUc&t=171s. The issue here
isn't how to limit the number of executors but to request for the right
GPU-enabled executors dynamically. Those executors used in p
Thanks again Artemis, I really appreciate it. I have watched the video but did
not find an answer.
Please bear with me just one more iteration 🙂
Maybe I'll be more specific:
Suppose I start the application with maxExecutors=500, executors.cores=2,
because that's the amount of resources needed f
Interesting discussion here, looks like Spark does not support configuring
different number of executors in different stages. Would love to see the
community come out such a feature.
On Thu, Nov 3, 2022 at 9:10 AM Shay Elbaz wrote:
> Thanks again Artemis, I really appreciate it. I have watched t
Er, wait, this is what stage-level scheduling is right? this has existed
since 3.1
https://issues.apache.org/jira/browse/SPARK-27495
On Thu, Nov 3, 2022 at 12:10 PM bo yang wrote:
> Interesting discussion here, looks like Spark does not support configuring
> different number of executors in diff
Stage level scheduling does not allow you to change configs right now. This is
something we thought about as follow on but have never implemented. How many
tasks on the DL stage are you running? The typical case is run some etl lots
of tasks... do mapPartitions and then run your DL stuff, bef
Now I see what you want to do. If you have access to the cluster
configuration files, you can modify the spark-env.sh file on the worker
nodes to specify exactly which node you'd like to link with GPU cores
and which one not. This would allow only those nodes configured with
GPU-resources get
This is exactly what we ended up doing! The only drawback I saw with this
approach is that the GPU tasks get pretty big (in terms of data and compute
time), and task failures become expansive. That's why I reached out to the
mailing list in the first place 🙂
Normally I try to aim for anything be
In addition to that:
For now some stateful operations in structured streaming don't have equivalent
python API, e.g. flatMapGroupsWithState. However spark engineers are making it
possible in the upcoming version. See more:
https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-pro
11 matches
Mail list logo