Alexey Strokach created BEAM-7750:
-------------------------------------

             Summary: Pipeline instances are not garbage collected
                 Key: BEAM-7750
                 URL: https://issues.apache.org/jira/browse/BEAM-7750
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
    Affects Versions: 2.14.0
         Environment: OS: Debian rodete.

Tested using: 
Beam versions: 2.13.0, 2.15.0.dev
Python versions: Python 2.7, Python 3.7.
Runners:  DirectRunner, DataflowRunner.
            Reporter: Alexey Strokach


It seems that Apache Beam's Pipeline instances are not garbage collected, even 
if the pipelines are finished or cancelled, and there are no references to 
those pipelines in the Python interpreter.

For pipelines executed in a script, this is not a problem. However, for 
interactive pipelines executed inside a Jupyter notebook, this limits how well 
we can track and remove the dependencies of those pipelines. For example, if a 
pipeline reads from some cache, it would be nice to be able to delete that 
cache once there are no references to it from a pipeline or the global 
namespace.

The issue can be reproduced using the following script: 
https://github.com/ostrokach/beam-notebooks/blob/48718038e63342a5f3acc31352a6326fffd34888/scripts/error_pipeline_gc.py



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to