Re: Running Flink on Yarn

Andrey Zagrebin Mon, 24 Dec 2018 09:10:11 -0800

I think the data buffered for join will be distributed among threads by
order_id (a1 and a2 will be internally keyed).
Each thread will have non-shared window state (for 2 hours) per certain
order_id's.
Slots will share some common JVM resources mentioned in docs, also access
to state DB but not the majority of storage occupied by state.


cc Timo, Piotr

On Mon, Dec 24, 2018 at 7:46 PM Anil <anilsingh....@gmail.com> wrote:

> I am using  time-windowed join only. Here's a sample query -
>
> SELECT a1.order_id, a2.order.restaurant_id FROM awz_s3_stream1 a1 INNER
> JOIN
> awz_s3_stream2 a2 ON CAST(a1.order_id AS VARCHAR) = a2.order_id AND
> a1.to_state = 'PLACED' AND a1.proctime BETWEEN a2.proctime - INTERVAL '2'
> HOUR AND a2.proctime + INTERVAL '2' HOUR GROUP BY HOP(a2.proctime, INTERVAL
> '2' MINUTE, INTERVAL '1' HOUR), a2.`order`.restaurant_id
>
> Just to simplify my question -
>
> Suppose I have a TM with 4 slots and I deploy a flink job with
> parallelism=4
> with 2 container - 1 JM and 1 TM. Each parallel instance will be deployed
> in
> one task slot each in the TM (the entire job pipeline running per slot ).My
> jobs does a join(SQL time-windowed join on non-keyed stream) and they
> buffer
> last few hours of data. My question is will these threads running in
> different task slot share this data buffered for join. What all data is
> shared across these threads.
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Running Flink on Yarn

Reply via email to