that piece of code is in place to let the worker being terminated by a 
sigterm, i.e a ctrl+c, that is useful for development purposes. it *should* 
have nothing to do with long running tasks, but I'm really honest saying I 
never had a single task alive for more than an hour. Frankly I don't know 
how to test it: being in front of a terminal for 4 days is not that 
feasible.

On Monday, December 12, 2016 at 2:58:47 PM UTC+1, Zbigniew Pomianowski 
wrote:
>
> First of all: I decided to use web2py for my purposes becase it is awesome 
> ;)
> I believe it is not a web2py's bug or anything like related thing. It can 
> be more OS and systemd related issue.
>
> Let me explain what I do and what is the environment. I work in a lab 
> where we try to automate many tests on physical devices (like STBs and 
> phones).
> I have a single source for master (ubuntu server) and slave servers 
> (ubuntu server/desktop). Master is configured with uwsgi+nginx+mysql+web2py 
> services. Then I do have slaves that use the same source, but can spawn 
> tests within scheduler processes.
>
> I need to connect many physical devices to the  slaves (climate chambers, 
> arduino for IR control, v4l2 capture cards, ethernet controled power 
> sources, power supply instruments, measurement instruments... bla bla bla).
> I decided to make a GUI using qooxdoo where user can write a python code 
> that allocates physical devices and run specific test scenarios to examine 
> DUT (Device Under Test) condition.
> These tests sometimes need to be run for tens of hours. So the workflow 
> can be described as:
>
>    - user writes a script
>    - the test is enqueued as a task in db (JobGraph does a perfect work 
>    for me because I need to control the execution sequence mainly because of 
>    the existence of physical devices like climate chambers and etc; allocated 
>    lab instrument cannot be used by two tests at the same time, jobgraph can 
>    yield it) 
>    - every slave has it's unique group-name
>       - DUTs and lab instruments are bound to the specific slave - 
>       scheduler group-name
>    - slave executes the test scenario programmed by user
>       - test is nothing more than overriden TestUnit
>       - every LAB instrument has child process which logs parameters 
>       (temperature, humidity, voltage bla bla bla)
>       - for DUT is also created instance of a class that spawns child 
>       processes (video freeze detection based on gstreamer, udp/tcp/telnet 
>       interface to interract with STB)
>       - in test scenario I have plenty of sleeps - test scenario demands 
>       for example that STB stays in a cimate chamber for 20h in specific temp 
> and 
>       humidity
>    
> My systemd service file looks like this:
> [Unit]
> Description=ATMS workers
> After=network-online.target
> Wants=network-online.target
>
> [Service]
> User=<USER>
> Restart=on-failure
> RestartSec=120
> Environment=DISPLAY=:<DISPLAY_NB> # usually 0
> Environment=XAUTHORITY=/home/<USER>/.Xauthority
> EnvironmentFile={{INSTALL}}/web2py_venv/web2py/applications/atms/private/
> atms.env
> ExecStartPre=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R 
> ${WEB2PYDIR}/applications/atms/systemd/on_start.py -P"
> ExecStart=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -K atms:%H,atms:%H"
> ExecStop=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R 
> ${WEB2PYDIR}/applications/atms/systemd/on_stop.py -P"
>
> [Install]
> # graphical because i had to make some kind of preview with ximagesink for 
> fast lookup if video is ok on STB
> WantedBy=graphical.target
> Alias=atms.service
>
>
> I realised that for very long test (last one was planned to be longer than 
> 100h) i got  sth like this in logs:
> gru 11 12:01:52 slaveX sh[2184]:   File 
> "/atms/web2py_venv/web2py/gluon/packages/dal/pydal/adapters/base.py", 
> line 1435, in
> gru 11 12:01:52 slaveX sh[2184]:     return str(long(obj))
> gru 11 12:01:52 slaveX sh[2184]:   File 
> "/atms/web2py_venv/web2py/gluon/packages/dal/pydal/objects.py", line 82, 
> in <lambda
> gru 11 12:01:52 slaveX sh[2184]:     __long__ = lambda self: long(self.get
> ('id'))
> gru 11 12:01:52 slaveX sh[2184]: TypeError: long() argument must be a 
> string or a number, not 'NoneType'
>
> The test was stopped 20h before it was supposed to be finished :/
> After some digging I found that before these errors i got this one:
> gru 11 12:01:34 slaveX sh[2184]: ERROR:web2py.app.atms:[(</tmp/
> taskId10672_caseId852_duts32/test_script.py.TestCase testMethod=
> test_example>, 'Traceback (most recent call last):\n  File 
> "/tmp/taskId10672_caseId852_duts32/test_script.py", line 90, in 
> test_example\n    sleep(M10)\n  File 
> "/atms/web2py_venv/web2py/gluon/scheduler.py", line 702, in <lambda>\n   
>  signal.signal(signal.SIGTERM, lambda signum, stack_frame: 
> sys.exit(1))\nSystemExit: 1\n')]
> gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms:    new task report
> : FAILED
> gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms:   traceback: 
> Traceback (most recent call last):
> .. and many many many tracebacks with errors after that
>
> Line 702 in scheduler.py is:
> signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))
> ....in scheduler's loop function. What does it mean? The process was 
> stopped because kernel/systemd sth else decided to do so??
> Long sleep calls can have sth in common?
> Did anyone encountered similar problems? Do you have any idea how to 
> prevent against such behavior?
>
> Thank you in advance for any response :)
>
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to