Thanks for your feedback. We expect that this issue will be fixed in
cloudpickle==1.3.0. Per [1], this release may be available next week.

After that you can install the fixed version of cloudpickle until the AI
notebook image picks up the new version.

[1] https://github.com/cloudpipe/cloudpickle/pull/337

On Tue, Feb 4, 2020 at 12:44 PM Alan Krumholz <alan.krumh...@betterup.co>
wrote:

> Seems like the image we use in KFP to orchestrate the job has 
> cloudpickle==0.8.1
> and that one doesn't seem to cause issues.
> I think I'm unblock for now but I'm sure I won't be the last one to try to
> do this using GCP managed notebooks :(
>
> Thanks for all the help!
>
>
> On Tue, Feb 4, 2020 at 12:24 PM Alan Krumholz <alan.krumh...@betterup.co>
> wrote:
>
>> I'm using a managed notebook instance from GCP
>> It seems those already come with cloudpickle==1.2.2 as soon as you
>> provision it. apache-beam[gcp] will then install dill==0.3.1.1 I'm going
>> to try to uninstall cloudpickle before installing apache-beam and see if
>> this fixes the problem
>>
>> Thank you
>>
>> On Tue, Feb 4, 2020 at 11:54 AM Valentyn Tymofieiev <valen...@google.com>
>> wrote:
>>
>>> The fact that you have cloudpickle==1.2.2 further confirms that you may
>>> be hitting the same error as
>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>  .
>>>
>>> Could you try to start over with a clean virtual environment?
>>>
>>> On Tue, Feb 4, 2020 at 11:46 AM Alan Krumholz <alan.krumh...@betterup.co>
>>> wrote:
>>>
>>>> Hi Valentyn,
>>>>
>>>> Here is my pip freeze on my machine (note that the error is in
>>>> dataflow, the job runs fine in my machine)
>>>>
>>>> ansiwrap==0.8.4
>>>> apache-beam==2.19.0
>>>> arrow==0.15.5
>>>> asn1crypto==1.3.0
>>>> astroid==2.3.3
>>>> astropy==3.2.3
>>>> attrs==19.3.0
>>>> avro-python3==1.9.1
>>>> azure-common==1.1.24
>>>> azure-storage-blob==2.1.0
>>>> azure-storage-common==2.1.0
>>>> backcall==0.1.0
>>>> bcolz==1.2.1
>>>> binaryornot==0.4.4
>>>> bleach==3.1.0
>>>> boto3==1.11.9
>>>> botocore==1.14.9
>>>> cachetools==3.1.1
>>>> certifi==2019.11.28
>>>> cffi==1.13.2
>>>> chardet==3.0.4
>>>> Click==7.0
>>>> cloudpickle==1.2.2
>>>> colorama==0.4.3
>>>> configparser==4.0.2
>>>> confuse==1.0.0
>>>> cookiecutter==1.7.0
>>>> crcmod==1.7
>>>> cryptography==2.8
>>>> cycler==0.10.0
>>>> daal==2019.0
>>>> datalab==1.1.5
>>>> decorator==4.4.1
>>>> defusedxml==0.6.0
>>>> dill==0.3.1.1
>>>> distro==1.0.1
>>>> docker==4.1.0
>>>> docopt==0.6.2
>>>> docutils==0.15.2
>>>> entrypoints==0.3
>>>> enum34==1.1.6
>>>> fairing==0.5.3
>>>> fastavro==0.21.24
>>>> fasteners==0.15
>>>> fsspec==0.6.2
>>>> future==0.18.2
>>>> gcsfs==0.6.0
>>>> gitdb2==2.0.6
>>>> GitPython==3.0.5
>>>> google-api-core==1.16.0
>>>> google-api-python-client==1.7.11
>>>> google-apitools==0.5.28
>>>> google-auth==1.11.0
>>>> google-auth-httplib2==0.0.3
>>>> google-auth-oauthlib==0.4.1
>>>> google-cloud-bigquery==1.17.1
>>>> google-cloud-bigtable==1.0.0
>>>> google-cloud-core==1.2.0
>>>> google-cloud-dataproc==0.6.1
>>>> google-cloud-datastore==1.7.4
>>>> google-cloud-language==1.3.0
>>>> google-cloud-logging==1.14.0
>>>> google-cloud-monitoring==0.31.1
>>>> google-cloud-pubsub==1.0.2
>>>> google-cloud-secret-manager==0.1.1
>>>> google-cloud-spanner==1.13.0
>>>> google-cloud-storage==1.25.0
>>>> google-cloud-translate==2.0.0
>>>> google-compute-engine==20191210.0
>>>> google-resumable-media==0.4.1
>>>> googleapis-common-protos==1.51.0
>>>> grpc-google-iam-v1==0.12.3
>>>> grpcio==1.26.0
>>>> h5py==2.10.0
>>>> hdfs==2.5.8
>>>> html5lib==1.0.1
>>>> htmlmin==0.1.12
>>>> httplib2==0.12.0
>>>> icc-rt==2020.0.133
>>>> idna==2.8
>>>> ijson==2.6.1
>>>> imageio==2.6.1
>>>> importlib-metadata==1.4.0
>>>> intel-numpy==1.15.1
>>>> intel-openmp==2020.0.133
>>>> intel-scikit-learn==0.19.2
>>>> intel-scipy==1.1.0
>>>> ipykernel==5.1.4
>>>> ipython==7.9.0
>>>> ipython-genutils==0.2.0
>>>> ipython-sql==0.3.9
>>>> ipywidgets==7.5.1
>>>> isort==4.3.21
>>>> jedi==0.16.0
>>>> Jinja2==2.11.0
>>>> jinja2-time==0.2.0
>>>> jmespath==0.9.4
>>>> joblib==0.14.1
>>>> json5==0.8.5
>>>> jsonschema==3.2.0
>>>> jupyter==1.0.0
>>>> jupyter-aihub-deploy-extension==0.1
>>>> jupyter-client==5.3.4
>>>> jupyter-console==6.1.0
>>>> jupyter-contrib-core==0.3.3
>>>> jupyter-contrib-nbextensions==0.5.1
>>>> jupyter-core==4.6.1
>>>> jupyter-highlight-selected-word==0.2.0
>>>> jupyter-http-over-ws==0.0.7
>>>> jupyter-latex-envs==1.4.6
>>>> jupyter-nbextensions-configurator==0.4.1
>>>> jupyterlab==1.2.6
>>>> jupyterlab-git==0.9.0
>>>> jupyterlab-server==1.0.6
>>>> keyring==10.1
>>>> keyrings.alt==1.3
>>>> kiwisolver==1.1.0
>>>> kubernetes==10.0.1
>>>> lazy-object-proxy==1.4.3
>>>> llvmlite==0.31.0
>>>> lxml==4.4.2
>>>> Markdown==3.1.1
>>>> MarkupSafe==1.1.1
>>>> matplotlib==3.0.3
>>>> mccabe==0.6.1
>>>> missingno==0.4.2
>>>> mistune==0.8.4
>>>> mkl==2019.0
>>>> mkl-fft==1.0.6
>>>> mkl-random==1.0.1.1
>>>> mock==2.0.0
>>>> monotonic==1.5
>>>> more-itertools==8.1.0
>>>> nbconvert==5.6.1
>>>> nbdime==1.1.0
>>>> nbformat==5.0.4
>>>> networkx==2.4
>>>> nltk==3.4.5
>>>> notebook==6.0.3
>>>> numba==0.47.0
>>>> numpy==1.15.1
>>>> oauth2client==3.0.0
>>>> oauthlib==3.1.0
>>>> opencv-python==4.1.2.30
>>>> oscrypto==1.2.0
>>>> packaging==20.1
>>>> pandas==0.25.3
>>>> pandas-profiling==1.4.0
>>>> pandocfilters==1.4.2
>>>> papermill==1.2.1
>>>> parso==0.6.0
>>>> pathlib2==2.3.5
>>>> pbr==5.4.4
>>>> pexpect==4.8.0
>>>> phik==0.9.8
>>>> pickleshare==0.7.5
>>>> Pillow-SIMD==6.2.2.post1
>>>> pipdeptree==0.13.2
>>>> plotly==4.5.0
>>>> pluggy==0.13.1
>>>> poyo==0.5.0
>>>> prettytable==0.7.2
>>>> prometheus-client==0.7.1
>>>> prompt-toolkit==2.0.10
>>>> protobuf==3.11.2
>>>> psutil==5.6.7
>>>> ptyprocess==0.6.0
>>>> py==1.8.1
>>>> pyarrow==0.15.1
>>>> pyasn1==0.4.8
>>>> pyasn1-modules==0.2.8
>>>> pycparser==2.19
>>>> pycrypto==2.6.1
>>>> pycryptodomex==3.9.6
>>>> pycurl==7.43.0
>>>> pydaal==2019.0.0.20180713
>>>> pydot==1.4.1
>>>> Pygments==2.5.2
>>>> pygobject==3.22.0
>>>> PyJWT==1.7.1
>>>> pylint==2.4.4
>>>> pymongo==3.10.1
>>>> pyOpenSSL==19.1.0
>>>> pyparsing==2.4.6
>>>> pyrsistent==0.15.7
>>>> pytest==5.3.4
>>>> pytest-pylint==0.14.1
>>>> python-apt==1.4.1
>>>> python-dateutil==2.8.1
>>>> pytz==2019.3
>>>> PyWavelets==1.1.1
>>>> pyxdg==0.25
>>>> PyYAML==5.3
>>>> pyzmq==18.1.1
>>>> qtconsole==4.6.0
>>>> requests==2.22.0
>>>> requests-oauthlib==1.3.0
>>>> retrying==1.3.3
>>>> rsa==4.0
>>>> s3transfer==0.3.2
>>>> scikit-image==0.15.0
>>>> scikit-learn==0.19.2
>>>> scipy==1.1.0
>>>> seaborn==0.9.1
>>>> SecretStorage==2.3.1
>>>> Send2Trash==1.5.0
>>>> simplegeneric==0.8.1
>>>> six==1.14.0
>>>> smmap2==2.0.5
>>>> snowflake-connector-python==2.2.0
>>>> SQLAlchemy==1.3.13
>>>> sqlparse==0.3.0
>>>> tbb==2019.0
>>>> tbb4py==2019.0
>>>> tenacity==6.0.0
>>>> terminado==0.8.3
>>>> testpath==0.4.4
>>>> textwrap3==0.9.2
>>>> tornado==5.1.1
>>>> tqdm==4.42.0
>>>> traitlets==4.3.3
>>>> typed-ast==1.4.1
>>>> typing==3.7.4.1
>>>> typing-extensions==3.7.4.1
>>>> unattended-upgrades==0.1
>>>> uritemplate==3.0.1
>>>> urllib3==1.24.2
>>>> virtualenv==16.7.9
>>>> wcwidth==0.1.8
>>>> webencodings==0.5.1
>>>> websocket-client==0.57.0
>>>> Werkzeug==0.16.1
>>>> whichcraft==0.6.1
>>>> widgetsnbextension==3.5.1
>>>> wrapt==1.11.2
>>>> zipp==1.1.0
>>>>
>>>>
>>>> On Tue, Feb 4, 2020 at 11:33 AM Valentyn Tymofieiev <
>>>> valen...@google.com> wrote:
>>>>
>>>>> It don't think there is a mismatch between dill versions here, but
>>>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>>>  mentions
>>>>> a similar error and may be related. What is the output of pip freeze on
>>>>> your machine (or better: pip install pipdeptree; pipdeptree)?
>>>>>
>>>>>
>>>>> On Tue, Feb 4, 2020 at 11:22 AM Alan Krumholz <
>>>>> alan.krumh...@betterup.co> wrote:
>>>>>
>>>>>> Here is a test job that sometimes fails and sometimes doesn't (but
>>>>>> most times do).....
>>>>>> There seems to be something stochastic that causes this as after
>>>>>> several tests a couple of them did succeed....
>>>>>>
>>>>>>
>>>>>> def test_error(
>>>>>>     bq_table: str) -> str:
>>>>>>
>>>>>>     import apache_beam as beam
>>>>>>     from apache_beam.options.pipeline_options import PipelineOptions
>>>>>>
>>>>>>     class GenData(beam.DoFn):
>>>>>>         def process(self, _):
>>>>>>             for _ in range (20000):
>>>>>>                 yield {'a':1,'b':2}
>>>>>>
>>>>>>
>>>>>>     def get_bigquery_schema():
>>>>>>         from apache_beam.io.gcp.internal.clients import bigquery
>>>>>>
>>>>>>         table_schema = bigquery.TableSchema()
>>>>>>         columns = [
>>>>>>             ["a","integer","nullable"],
>>>>>>             ["b","integer","nullable"]
>>>>>>         ]
>>>>>>
>>>>>>         for column in columns:
>>>>>>             column_schema = bigquery.TableFieldSchema()
>>>>>>             column_schema.name = column[0]
>>>>>>             column_schema.type = column[1]
>>>>>>             column_schema.mode = column[2]
>>>>>>             table_schema.fields.append(column_schema)
>>>>>>
>>>>>>         return table_schema
>>>>>>
>>>>>>     pipeline = beam.Pipeline(options=PipelineOptions(
>>>>>>         project='my-project',
>>>>>>         temp_location = 'gs://my-bucket/temp',
>>>>>>         staging_location = 'gs://my-bucket/staging',
>>>>>>         runner='DataflowRunner'
>>>>>>     ))
>>>>>>     #pipeline = beam.Pipeline()
>>>>>>
>>>>>>     (
>>>>>>         pipeline
>>>>>>         | 'Empty start' >> beam.Create([''])
>>>>>>         | 'Generate Data' >> beam.ParDo(GenData())
>>>>>>         #| 'print' >> beam.Map(print)
>>>>>>         | 'Write to BigQuery' >> beam.io.WriteToBigQuery(
>>>>>>                     project=bq_table.split(':')[0],
>>>>>>                     dataset=bq_table.split(':')[1].split('.')[0],
>>>>>>                     table=bq_table.split(':')[1].split('.')[1],
>>>>>>                     schema=get_bigquery_schema(),
>>>>>>
>>>>>> create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>>>>>>
>>>>>> write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
>>>>>>     )
>>>>>>
>>>>>>     result = pipeline.run()
>>>>>>     result.wait_until_finish()
>>>>>>
>>>>>>     return True
>>>>>>
>>>>>> test_error(
>>>>>>     bq_table = 'my-project:my_dataset.my_table'
>>>>>> )
>>>>>>
>>>>>> On Tue, Feb 4, 2020 at 10:04 AM Alan Krumholz <
>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>
>>>>>>> I tried breaking apart my pipeline. Seems the step that breaks it is:
>>>>>>> beam.io.WriteToBigQuery
>>>>>>>
>>>>>>> Let me see if I can create a self contained example that breaks to
>>>>>>> share with you
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> On Tue, Feb 4, 2020 at 9:53 AM Pablo Estrada <pabl...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hm that's odd. No changes to the pipeline? Are you able to share
>>>>>>>> some of the code?
>>>>>>>>
>>>>>>>> +Udi Meiri <eh...@google.com> do you have any idea what could be
>>>>>>>> going on here?
>>>>>>>>
>>>>>>>> On Tue, Feb 4, 2020 at 9:25 AM Alan Krumholz <
>>>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>>>
>>>>>>>>> Hi Pablo,
>>>>>>>>> This is strange... it doesn't seem to be the last beam release as
>>>>>>>>> last night it was already using 2.19.0 I wonder if it was some 
>>>>>>>>> release from
>>>>>>>>> the DataFlow team (not beam related):
>>>>>>>>> Job typeBatch
>>>>>>>>> Job status Succeeded
>>>>>>>>> SDK version
>>>>>>>>> Apache Beam Python 3.5 SDK 2.19.0
>>>>>>>>> Region
>>>>>>>>> us-central1
>>>>>>>>> Start timeFebruary 3, 2020 at 9:28:35 PM GMT-8
>>>>>>>>> Elapsed time5 min 11 sec
>>>>>>>>>
>>>>>>>>> On Tue, Feb 4, 2020 at 9:15 AM Pablo Estrada <pabl...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Alan,
>>>>>>>>>> could it be that you're picking up the new Apache Beam 2.19.0
>>>>>>>>>> release? Could you try depending on beam 2.18.0 to see if the issue
>>>>>>>>>> surfaces when using the new release?
>>>>>>>>>>
>>>>>>>>>> If something was working and no longer works, it sounds like a
>>>>>>>>>> bug. This may have to do with how we pickle (dill / cloudpickle) - 
>>>>>>>>>> see this
>>>>>>>>>> question
>>>>>>>>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>>>>>>>> Best
>>>>>>>>>> -P.
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 4, 2020 at 6:22 AM Alan Krumholz <
>>>>>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I was running a dataflow job in GCP last night and it was
>>>>>>>>>>> running fine.
>>>>>>>>>>> This morning this same exact job is failing with the following
>>>>>>>>>>> error:
>>>>>>>>>>>
>>>>>>>>>>> Error message from worker: Traceback (most recent call last):
>>>>>>>>>>> File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>>>>>> line 286, in loads return dill.loads(s) File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, 
>>>>>>>>>>> in loads
>>>>>>>>>>> return load(file, ignore, **kwds) File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, 
>>>>>>>>>>> in load
>>>>>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, 
>>>>>>>>>>> in load
>>>>>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
>>>>>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType' 
>>>>>>>>>>> During
>>>>>>>>>>> handling of the above exception, another exception occurred: 
>>>>>>>>>>> Traceback
>>>>>>>>>>> (most recent call last): File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py",
>>>>>>>>>>> line 648, in do_work work_executor.execute() File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/executor.py",
>>>>>>>>>>>  line
>>>>>>>>>>> 176, in execute op.start() File 
>>>>>>>>>>> "apache_beam/runners/worker/operations.py",
>>>>>>>>>>> line 649, in 
>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 651, in
>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 652, in
>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 261, in
>>>>>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 266, in
>>>>>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 597, in
>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 602, in
>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>>>>>> line 290, in loads return dill.loads(s) File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, 
>>>>>>>>>>> in loads
>>>>>>>>>>> return load(file, ignore, **kwds) File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, 
>>>>>>>>>>> in load
>>>>>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, 
>>>>>>>>>>> in load
>>>>>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
>>>>>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType'
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If I use a local runner it still runs fine.
>>>>>>>>>>> Anyone else experiencing something similar today? (or know how
>>>>>>>>>>> to fix this?)
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>

Reply via email to