For Test/Main, I have the user's ~/.bash_profile set $PYTHON_EGG_CACHE on a
per-node basis. This could also be done per-node and per-pty to ensure
uniqueness per job.
--nate
On Sep 18, 2012, at 11:24 AM, James Taylor wrote:
Interesting. If I'm reading this correctly the problem is happening
For completeness, here's two tracebacks (there were more similar ones)
from the same job:
/mnt/galaxyData/tmp/job_working_directory/000/75/task_4:
Traceback (most recent call last):
File ./scripts/extract_dataset_part.py, line 25, in module
import galaxy.model.mapping #need to load this
I added this snippet to the top of my extract_dataset_part.py:
pkg_resources.require(simplejson)
# wait until this process' PID is the first PID of all processes with
the same name, then import
while True:
with os.popen(ps ax|grep extract_dataset_part.py |grep -v grep|awk
'{print $1}')
Hi again,
I have looked into this matter a little bit more, and it looks like this
is happening:
- tasked job is split
- tasks commands are sent to workers (I am running 8-core high cpu extra
large workers on EC2)
- per task, worker runs env.sh for the respective tool
- per task, worker
Interesting. If I'm reading this correctly the problem is happening
inside pkg_resources? (galaxy.eggs unzips eggs, but I think it does so
on install [fetch_eggs] time not run time which would avoid this). If
so this would seem to be a locking bug in pkg_resources. Dannon, we
could put a guard
Dear list,
I am running galaxy-dist on Amazon EC2 through Cloudman, and am using
the enable_tasked_jobs to run jobs in parallel. Yes, I know it's not
recommended in production. My jobs usually get split in 72 parts, and
sometimes (but not always, maybe in 30-50% of cases), errors are