Hi Shailesh, your JobManager log suggests that this same JVM instance actually contains a TaskManager as well (sorry for not noticing earlier). Also this time, there is nothing regarding the BlobServer/BlobCache, but it looks like the task manager may think the jobmanager is down. Can you try with "start-cluster.sh" instead?
Nico On Tuesday, 21 November 2017 07:26:09 CET Shailesh Jain wrote: > a) Nope, there are no taskmanager logs, the job never switches to RUNNING > state. > > b) I think so, because even when I start the job with 4 devices, only 1 > slot is used, and 3 are free. > > c) Attached > > d) Attached > > e) I'll try the debug mode in Eclipse. > > Thanks, > Shailesh > > On Fri, Nov 17, 2017 at 1:52 PM, Nico Kruber <n...@data-artisans.com> wrote: > > regarding 3. > > a) The taskmanager logs are missing, are there any? > > b) Also, the JobManager logs say you have 4 slots available in total - is > > this > > enough for your 5 devices scenario? > > c) The JobManager log, however, does not really reveal what it is > > currently > > doing, can you set the log level to DEBUG to see more? > > d) Also, do you still observe CPU load during the 15min as an indication > > that > > it is actually doing something? > > e) During this 15min period where apparently nothing happens, can you > > provide > > the output of "jstack <jobmanager_pid>" (with the PID of your JobManager)? > > f) You may further be able to debug into what is happening by running this > > in > > your IDE in debug mode and pause the execution when you suspect it to > > hang. > > > > > > Nico > > > > On Tuesday, 14 November 2017 14:27:36 CET Piotr Nowojski wrote: > > > 3. Nico, can you take a look at this one? Isn’t this a blob server > > > issue? > > > > > > Piotrek > > > > > > > On 14 Nov 2017, at 11:35, Shailesh Jain <shailesh.j...@stellapps.com> > > > > wrote: > > > > > > > > 3. Have attached the logs and exception raised (15min - configured > > > > akka > > > > timeout) after submitting the job. > > > > > > > > Thanks, > > > > Shailesh > > > > > > > > > > > > On Tue, Nov 14, 2017 at 2:46 PM, Piotr Nowojski < > > > > pi...@data-artisans.com > > > > > > <mailto:pi...@data-artisans.com>> wrote: Hi, > > > > > > > > 3. Can you show the logs from job manager and task manager? > > > > > > > >> On 14 Nov 2017, at 07:26, Shailesh Jain <shailesh.j...@stellapps.com > > > >> <mailto:shailesh.j...@stellapps.com>> wrote: > > > >> > > > >> Hi Piotrek, > > > >> > > > >> I tried out option 'a' mentioned above, but instead of separate jobs, > > > > I'm > > > > > >> creating separate streams per device. Following is the test > > > >> deployment > > > >> configuration as a local cluster (8GB ram, 2.5 GHz i5, ubuntu > > > > machine): > > > >> akka.client.timeout 15 min > > > >> jobmanager.heap.mb 1024 > > > >> jobmanager.rpc.address localhost > > > >> jobmanager.rpc.port 6123 > > > >> jobmanager.web.port 8081 > > > >> metrics.reporter.jmx.class org.apache.flink.metrics.jmx.JMXReporter > > > >> metrics.reporter.jmx.port 8789 > > > >> metrics.reporters jmx > > > >> parallelism.default 1 > > > >> taskmanager.heap.mb 1024 > > > >> taskmanager.memory.preallocate false > > > >> taskmanager.numberOfTaskSlots 4 > > > >> > > > >> The number of Operators per device stream is 4 (one sink function, 3 > > > > CEP > > > > > >> operators). > > > >> > > > >> Observations (and questions): > > > >> > > > >> 3. Job deployment hangs (never switches to RUNNING) when the number > > > >> of > > > >> devices is greater than 5. Even on increasing the akka client > > > >> timeout, > > > >> it does not help. Will separate jobs being deployed per device > > > >> instead > > > >> of separate streams help here? > > > >> > > > >> Thanks, > > > >> Shailesh
signature.asc
Description: This is a digitally signed message part.