that piece of code is in place to let the worker being terminated by a sigterm, i.e a ctrl+c, that is useful for development purposes. it *should* have nothing to do with long running tasks, but I'm really honest saying I never had a single task alive for more than an hour. Frankly I don't know how to test it: being in front of a terminal for 4 days is not that feasible.
On Monday, December 12, 2016 at 2:58:47 PM UTC+1, Zbigniew Pomianowski wrote: > > First of all: I decided to use web2py for my purposes becase it is awesome > ;) > I believe it is not a web2py's bug or anything like related thing. It can > be more OS and systemd related issue. > > Let me explain what I do and what is the environment. I work in a lab > where we try to automate many tests on physical devices (like STBs and > phones). > I have a single source for master (ubuntu server) and slave servers > (ubuntu server/desktop). Master is configured with uwsgi+nginx+mysql+web2py > services. Then I do have slaves that use the same source, but can spawn > tests within scheduler processes. > > I need to connect many physical devices to the slaves (climate chambers, > arduino for IR control, v4l2 capture cards, ethernet controled power > sources, power supply instruments, measurement instruments... bla bla bla). > I decided to make a GUI using qooxdoo where user can write a python code > that allocates physical devices and run specific test scenarios to examine > DUT (Device Under Test) condition. > These tests sometimes need to be run for tens of hours. So the workflow > can be described as: > > - user writes a script > - the test is enqueued as a task in db (JobGraph does a perfect work > for me because I need to control the execution sequence mainly because of > the existence of physical devices like climate chambers and etc; allocated > lab instrument cannot be used by two tests at the same time, jobgraph can > yield it) > - every slave has it's unique group-name > - DUTs and lab instruments are bound to the specific slave - > scheduler group-name > - slave executes the test scenario programmed by user > - test is nothing more than overriden TestUnit > - every LAB instrument has child process which logs parameters > (temperature, humidity, voltage bla bla bla) > - for DUT is also created instance of a class that spawns child > processes (video freeze detection based on gstreamer, udp/tcp/telnet > interface to interract with STB) > - in test scenario I have plenty of sleeps - test scenario demands > for example that STB stays in a cimate chamber for 20h in specific temp > and > humidity > > My systemd service file looks like this: > [Unit] > Description=ATMS workers > After=network-online.target > Wants=network-online.target > > [Service] > User=<USER> > Restart=on-failure > RestartSec=120 > Environment=DISPLAY=:<DISPLAY_NB> # usually 0 > Environment=XAUTHORITY=/home/<USER>/.Xauthority > EnvironmentFile={{INSTALL}}/web2py_venv/web2py/applications/atms/private/ > atms.env > ExecStartPre=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R > ${WEB2PYDIR}/applications/atms/systemd/on_start.py -P" > ExecStart=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -K atms:%H,atms:%H" > ExecStop=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R > ${WEB2PYDIR}/applications/atms/systemd/on_stop.py -P" > > [Install] > # graphical because i had to make some kind of preview with ximagesink for > fast lookup if video is ok on STB > WantedBy=graphical.target > Alias=atms.service > > > I realised that for very long test (last one was planned to be longer than > 100h) i got sth like this in logs: > gru 11 12:01:52 slaveX sh[2184]: File > "/atms/web2py_venv/web2py/gluon/packages/dal/pydal/adapters/base.py", > line 1435, in > gru 11 12:01:52 slaveX sh[2184]: return str(long(obj)) > gru 11 12:01:52 slaveX sh[2184]: File > "/atms/web2py_venv/web2py/gluon/packages/dal/pydal/objects.py", line 82, > in <lambda > gru 11 12:01:52 slaveX sh[2184]: __long__ = lambda self: long(self.get > ('id')) > gru 11 12:01:52 slaveX sh[2184]: TypeError: long() argument must be a > string or a number, not 'NoneType' > > The test was stopped 20h before it was supposed to be finished :/ > After some digging I found that before these errors i got this one: > gru 11 12:01:34 slaveX sh[2184]: ERROR:web2py.app.atms:[(</tmp/ > taskId10672_caseId852_duts32/test_script.py.TestCase testMethod= > test_example>, 'Traceback (most recent call last):\n File > "/tmp/taskId10672_caseId852_duts32/test_script.py", line 90, in > test_example\n sleep(M10)\n File > "/atms/web2py_venv/web2py/gluon/scheduler.py", line 702, in <lambda>\n > signal.signal(signal.SIGTERM, lambda signum, stack_frame: > sys.exit(1))\nSystemExit: 1\n')] > gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms: new task report > : FAILED > gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms: traceback: > Traceback (most recent call last): > .. and many many many tracebacks with errors after that > > Line 702 in scheduler.py is: > signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1)) > ....in scheduler's loop function. What does it mean? The process was > stopped because kernel/systemd sth else decided to do so?? > Long sleep calls can have sth in common? > Did anyone encountered similar problems? Do you have any idea how to > prevent against such behavior? > > Thank you in advance for any response :) > > -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.