Hi Team,
I need help with this
https://stackoverflow.com/questions/78547676/tox-with-pyspark
Thanks Mich for the detailed explanation.
On Tue, May 28, 2024 at 9:53 PM Mich Talebzadeh
wrote:
> Russell mentioned some of these issues before. So in short your mileage
> varies. For a 100 GB data transfer, the speed difference between Glue and
> EMR might not be significant, especially consid
I agree with the previous answers that (if requirements allow it) it is
much easier to just orchestrate a copy either in the same app or sync
externally.
A long time ago and not for a Spark app we were solving a similar usecase
via
https://hadoop.apache.org/docs/r3.2.3/hadoop-project-dist/hadoop-h
If Glue lets you take a configuration based approach, and you don't have to
operate any servers as with EMR... use Glue. Try EMR if that is troublesome.
Russ
On Tue, May 28, 2024 at 9:23 AM Mich Talebzadeh
wrote:
> Russell mentioned some of these issues before. So in short your mileage
> varies
Russell mentioned some of these issues before. So in short your mileage
varies. For a 100 GB data transfer, the speed difference between Glue and
EMR might not be significant, especially considering the benefits of Glue's
managed service aspects. However, for much larger datasets or scenarios
where
Thanks Mich.
Yes, I agree on the costing part but how does the data transfer speed be
impacted? Is it because glue takes some time to initialize underlying
resources and then process the data?
On Tue, May 28, 2024 at 2:23 PM Mich Talebzadeh
wrote:
> Your mileage varies as usual
>
> Glue with D
Your mileage varies as usual
Glue with DPUs seems like a strong contender for your data transfer needs
based on the simplicity, scalability, and managed service aspects. However,
if data transfer speed is critical or costs become a concern after testing,
consider EMR as an alternative.
HTH
Mich