Hi all,
I've noticed that spark's xxhas64 output doesn't match other tool's due to
using seed=42 as a default. I've looked at a few libraries and they use 0
as a default seed:
- python https://github.com/ifduyue/python-xxhash
- java https://github.com/OpenHFT/Zero-Allocation-Hashing/
- java (slic
You might be affected by this issue:
https://github.com/apache/iceberg/issues/8601
It was already patched but it isn't released yet.
On Thu, Oct 5, 2023 at 7:47 PM Prashant Sharma wrote:
> Hi Sanket, more details might help here.
>
> How does your spark configuration look like?
>
> What exactly
You can tag the last entry with each key using the same window you're using
for your rolling sum. Something like this: "LEAD(1) OVER your_window IS
NULL as last_record". Then, you just UNION ALL the last entry of each
key(which you tagged) with the new data and run the same query over the
windowed
needs, given
>the fact that you only have 128GB RAM.
>
> Hope this helps...
>
> On 9/29/22 2:12 PM, Igor Calabria wrote:
>
> Hi Everyone,
>
> I'm running spark 3.2 on kubernetes and have a job with a decently sized
> shuffle of almost 4TB. The relevant cluster
instance storage, your 30x30 exchange can run into EBS IOPS limits. You can
> investigate that by going to an instance, then to volume, and see
> monitoring charts.
>
> Another thought is that you're essentially giving 4GB per core. That
> sounds pretty low, in my experience.
>
Hi Everyone,
I'm running spark 3.2 on kubernetes and have a job with a decently sized
shuffle of almost 4TB. The relevant cluster config is as follows:
- 30 Executors. 16 physical cores, configured with 32 Cores for spark
- 128 GB RAM
- shuffle.partitions is 18k which gives me tasks of around 15