Hi all,
I am trying to run the Kallisto package command on the apache beam worker.
Below is a table that describes my steps on the apache beam pipeline code
and local compute Debian machine (new machine). I used both of them
for debug and comparison.
On a local machine, the execution completes with no issues. On apache beam,
I am having issues with no error. Very challenging to debug.
The only issue that I am familiar with the Kallisto package is when there
is not enough disk for the input and the output. I have added the resources
commands on the local and remote machine. Please let me know if there is
another way to manage the resources.
Thank you,
Eila
task
Local
Apache worker
resources
n1-standard-8 (8 vCPUs, 30 GB memory)
60 GB persistent disk
GoogleCloudOptions.disk_size_gb = 60
GoogleCloudOptions.worker_machine_type = 'n1-standard-4'
anaconda
A created base environment with Kallisto package
Created base environment with kallisto package
command
from subprocess import Popen, PIPE, STDOUT
import logging
script = "/home/eila_orielresearch_org/etc/profile.d/conda.sh"
cmd1 = ". {}; env".format(script)
cmd2 = "echo finished kallisto"
cmd3 = "echo before init"
cmd4 = "conda init --all"
cmd5 = "conda activate"
cmd6 = "kallisto quant -t 2 -i release-99_transcripts.idx --single -l 200
-s 20 -o srr SRR2144345.fastq"
cmd7 = "conda deactivate"
final = Popen("{}; {}; {}; {}; {}; {};
{}".format(cmd1,cmd2,cmd3,cmd4,cmd5,cmd6,cmd7), shell=True,
stdin=PIPE,stdout=PIPE, stderr=STDOUT, close_fds=True)
stdout, nothing = final.communicate()
stdout
from subprocess import Popen, PIPE, STDOUT
import logging
script = "/opt/userowned/etc/profile.d/conda.sh"
cmd1 = ". {}; env".format(script)
cmd2 = "echo finished kallisto"
cmd3 = "echo before init"
cmd4 = "conda init --all"
cmd5 = "conda activate"
cmd6 = "kallisto quant -t 2 -i release-99_transcripts.idx --single -l 200
-s 20 -o srr SRR2144345.fastq"
cmd7 = "conda deactivate"
final = Popen("{}; {}; {}; {}; {}; {};
{}".format(cmd1,cmd2,cmd3,cmd4,cmd5,cmd6,cmd7), shell=True,
stdin=PIPE,stdout=PIPE, stderr=STDOUT, close_fds=True)
stdout, nothing = final.communicate()
stdout
output
eila_orielresearch_org@instance-1:~/srr$ ls -lt
total 8548
-rw-r--r-- 1 eila_orielresearch_org eila_orielresearch_org 2174869 May 11
16:19 abundance.h5
-rw-r--r-- 1 eila_orielresearch_org eila_orielresearch_org 6570911 May 11
16:19 abundance.tsv
-rw-r--r-- 1 eila_orielresearch_org eila_orielresearch_org 371 May 11
16:19 run_info.json
No output.
hanging on the yellow command. no error. restarting DoFn execution
--
Eila
<http://www.orielresearch.com>
Meetup <https://www.meetup.com/Deep-Learning-In-Production/>