There are a few separate issues in here that we should
tackle/investigate in parallel.
Improve storage of latency metrics
Given how absurdly the number of latency metrics scale with # of
operators / parallelism
it makes sense to introduce a special case here. I'm not quite sure yet
ho
With `latencyTrackingInterval` set to `0` cluster runs fine.
So, is this something which make sense to be improved? JIRA I can track or
file one?
On Fri, Aug 24, 2018 at 11:50 AM Chesnay Schepler
wrote:
> I believe the only thing you can do is disable latency tracking, by
> setting the `latencyT
I believe the only thing you can do is disable latency tracking, by
setting the `latencyTrackingInterval` in `env.getExecutionConfig()` to 0
or a negative value.
The update frequency is not configurable and currently set to 10 seconds.
Latency metrics are tracked as the cross-product of all su
For my small job, I see ~24k those latency metrics @
'/jobs/.../metrics'. That job is much smaller in terms of production
parallelism.
Are there any options here. Can it be turned off, reduced histogram
metrics, reduced update frequency, ... ?
Also, keeping it flat seems to use quite some memory o
In 1.5 the latency metric was changed to be reported on the job-level,
that's why you see it under /jobs/.../metrics now, but not in 1.4.
In 1.4 you would see something similar under
/jobs/.../vertices/.../metrics, for each vertex.
Additionally it is now a proper histogram, which significantly
parallelism is 100. I tried clusters with 1 and 2 slots per TM yielding
100 or 50 TMs in cluster.
I did notice that URL http://jobmanager:port/jobs/job_id/metrics in 1.5.x
returns huge list of "latency.source_id. " IDs. Heap dump shows that
hash map takes 1.6GB for me. I am guessing that is
Hi,
How many task slots do you have in the cluster and per machine, and what
parallelism are you using?
Piotrek
> On 23 Aug 2018, at 16:21, Jozef Vilcek wrote:
>
> Yes, on smaller data and therefore smaller resources and parallelism
> exactly same job runs fine
>
> On Thu, Aug 23, 2018, 16:1
Yes, on smaller data and therefore smaller resources and parallelism
exactly same job runs fine
On Thu, Aug 23, 2018, 16:11 Aljoscha Krettek wrote:
> Hi,
>
> So with Flink 1.5.3 but a smaller parallelism the job works fine?
>
> Best,
> Aljoscha
>
> > On 23. Aug 2018, at 15:25, Jozef Vilcek wrot
Hi,
So with Flink 1.5.3 but a smaller parallelism the job works fine?
Best,
Aljoscha
> On 23. Aug 2018, at 15:25, Jozef Vilcek wrote:
>
> Hello,
>
> I am trying to get my Beam application (run on newer version of Flink
> (1.5.3) but having trouble with that. When I submit application, everyth
Hello,
I am trying to get my Beam application (run on newer version of Flink
(1.5.3) but having trouble with that. When I submit application, everything
works fine but after a few mins (as soon as 2 minutes after job start)
cluster just goes bad. Logs are full of timeouts for heartbeats, JobManage
10 matches
Mail list logo