Re: Correlation between data streams/operators and threads

Nico Kruber Wed, 22 Nov 2017 07:00:06 -0800

Hi Shailesh,
your JobManager log suggests that this same JVM instance actually contains a 
TaskManager as well (sorry for not noticing earlier). Also this time, there is 
nothing regarding the BlobServer/BlobCache, but it looks like the task manager 
may think the jobmanager is down.
Can you try with "start-cluster.sh" instead?


Nico

On Tuesday, 21 November 2017 07:26:09 CET Shailesh Jain wrote:
> a) Nope, there are no taskmanager logs, the job never switches to RUNNING
> state.
> 
> b) I think so, because even when I start the job with 4 devices, only 1
> slot is used, and 3 are free.
> 
> c) Attached
> 
> d) Attached
> 
> e) I'll try the debug mode in Eclipse.
> 
> Thanks,
> Shailesh
> 
> On Fri, Nov 17, 2017 at 1:52 PM, Nico Kruber <n...@data-artisans.com> wrote:
> > regarding 3.
> > a) The taskmanager logs are missing, are there any?
> > b) Also, the JobManager logs say you have 4 slots available in total - is
> > this
> > enough for your 5 devices scenario?
> > c) The JobManager log, however, does not really reveal what it is
> > currently
> > doing, can you set the log level to DEBUG to see more?
> > d) Also, do you still observe CPU load during the 15min as an indication
> > that
> > it is actually doing something?
> > e) During this 15min period where apparently nothing happens, can you
> > provide
> > the output of "jstack <jobmanager_pid>" (with the PID of your JobManager)?
> > f) You may further be able to debug into what is happening by running this
> > in
> > your IDE in debug mode and pause the execution when you suspect it to
> > hang.
> > 
> > 
> > Nico
> > 
> > On Tuesday, 14 November 2017 14:27:36 CET Piotr Nowojski wrote:
> > > 3. Nico, can you take a look at this one? Isn’t this a blob server
> > > issue?
> > > 
> > > Piotrek
> > > 
> > > > On 14 Nov 2017, at 11:35, Shailesh Jain <shailesh.j...@stellapps.com>
> > > > wrote:
> > > > 
> > > > 3. Have attached the logs and exception raised (15min - configured
> > > > akka
> > > > timeout) after submitting the job.
> > > > 
> > > > Thanks,
> > > > Shailesh
> > > > 
> > > > 
> > > > On Tue, Nov 14, 2017 at 2:46 PM, Piotr Nowojski <
> > 
> > pi...@data-artisans.com
> > 
> > > > <mailto:pi...@data-artisans.com>> wrote: Hi,
> > > > 
> > > > 3. Can you show the logs from job manager and task manager?
> > > > 
> > > >> On 14 Nov 2017, at 07:26, Shailesh Jain <shailesh.j...@stellapps.com
> > > >> <mailto:shailesh.j...@stellapps.com>> wrote:
> > > >> 
> > > >> Hi Piotrek,
> > > >> 
> > > >> I tried out option 'a' mentioned above, but instead of separate jobs,
> > 
> > I'm
> > 
> > > >> creating separate streams per device. Following is the test
> > > >> deployment
> > > >> configuration as a local cluster (8GB ram, 2.5 GHz i5, ubuntu
> > 
> > machine):
> > > >> akka.client.timeout 15 min
> > > >> jobmanager.heap.mb 1024
> > > >> jobmanager.rpc.address localhost
> > > >> jobmanager.rpc.port 6123
> > > >> jobmanager.web.port 8081
> > > >> metrics.reporter.jmx.class org.apache.flink.metrics.jmx.JMXReporter
> > > >> metrics.reporter.jmx.port 8789
> > > >> metrics.reporters jmx
> > > >> parallelism.default 1
> > > >> taskmanager.heap.mb 1024
> > > >> taskmanager.memory.preallocate false
> > > >> taskmanager.numberOfTaskSlots 4
> > > >> 
> > > >> The number of Operators per device stream is 4 (one sink function, 3
> > 
> > CEP
> > 
> > > >> operators).
> > > >> 
> > > >> Observations (and questions):
> > > >> 
> > > >> 3. Job deployment hangs (never switches to RUNNING) when the number
> > > >> of
> > > >> devices is greater than 5. Even on increasing the akka client
> > > >> timeout,
> > > >> it does not help. Will separate jobs being deployed per device
> > > >> instead
> > > >> of separate streams help here?
> > > >> 
> > > >> Thanks,
> > > >> Shailesh

signature.asc
Description: This is a digitally signed message part.

Re: Correlation between data streams/operators and threads

Reply via email to