Hi all,

I am trying to run the Kallisto package command on the apache beam worker.
Below is a table that describes my steps on the apache beam pipeline code
and local compute Debian machine (new machine). I used both of them
for debug and comparison.
On a local machine, the execution completes with no issues. On apache beam,
I am having issues with no error. Very challenging to debug.

The only issue that I am familiar with the Kallisto package is when there
is not enough disk for the input and the output. I have added the resources
commands on the local and remote machine. Please let me know if there is
another way to manage the resources.

Thank you,
Eila


task

Local

Apache worker

resources

n1-standard-8 (8 vCPUs, 30 GB memory)

60 GB persistent disk

GoogleCloudOptions.disk_size_gb = 60

GoogleCloudOptions.worker_machine_type = 'n1-standard-4'

anaconda

A created base environment with Kallisto package

Created base environment with kallisto package

command

from subprocess import Popen, PIPE, STDOUT

import logging

script = "/home/eila_orielresearch_org/etc/profile.d/conda.sh"

cmd1 = ". {}; env".format(script)

cmd2 = "echo finished kallisto"

cmd3 = "echo before init"

cmd4 = "conda init --all"

cmd5 = "conda activate"

cmd6 = "kallisto quant -t 2 -i release-99_transcripts.idx --single -l 200
-s 20 -o srr SRR2144345.fastq"

cmd7 = "conda deactivate"

final = Popen("{}; {}; {}; {}; {}; {};
{}".format(cmd1,cmd2,cmd3,cmd4,cmd5,cmd6,cmd7), shell=True,
stdin=PIPE,stdout=PIPE, stderr=STDOUT, close_fds=True)

stdout, nothing = final.communicate()

stdout

from subprocess import Popen, PIPE, STDOUT

import logging

script = "/opt/userowned/etc/profile.d/conda.sh"

cmd1 = ". {}; env".format(script)

cmd2 = "echo finished kallisto"

cmd3 = "echo before init"

cmd4 = "conda init --all"

cmd5 = "conda activate"

cmd6 = "kallisto quant -t 2 -i release-99_transcripts.idx --single -l 200
-s 20 -o srr SRR2144345.fastq"

cmd7 = "conda deactivate"

final = Popen("{}; {}; {}; {}; {}; {};
{}".format(cmd1,cmd2,cmd3,cmd4,cmd5,cmd6,cmd7), shell=True,
stdin=PIPE,stdout=PIPE, stderr=STDOUT, close_fds=True)

stdout, nothing = final.communicate()

stdout

output

eila_orielresearch_org@instance-1:~/srr$ ls -lt

total 8548

-rw-r--r-- 1 eila_orielresearch_org eila_orielresearch_org 2174869 May 11
16:19 abundance.h5

-rw-r--r-- 1 eila_orielresearch_org eila_orielresearch_org 6570911 May 11
16:19 abundance.tsv

-rw-r--r-- 1 eila_orielresearch_org eila_orielresearch_org     371 May 11
16:19 run_info.json

No output.

hanging on the yellow command. no error. restarting DoFn execution


-- 
Eila
<http://www.orielresearch.com>
Meetup <https://www.meetup.com/Deep-Learning-In-Production/>

Reply via email to