perfect! thank you!

On Fri, Feb 7, 2020 at 10:54 AM Valentyn Tymofieiev <valen...@google.com>
wrote:

> Thanks for your feedback. We expect that this issue will be fixed in
> cloudpickle==1.3.0. Per [1], this release may be available next week.
>
> After that you can install the fixed version of cloudpickle until the AI
> notebook image picks up the new version.
>
> [1] https://github.com/cloudpipe/cloudpickle/pull/337
>
> On Tue, Feb 4, 2020 at 12:44 PM Alan Krumholz <alan.krumh...@betterup.co>
> wrote:
>
>> Seems like the image we use in KFP to orchestrate the job has 
>> cloudpickle==0.8.1
>> and that one doesn't seem to cause issues.
>> I think I'm unblock for now but I'm sure I won't be the last one to try
>> to do this using GCP managed notebooks :(
>>
>> Thanks for all the help!
>>
>>
>> On Tue, Feb 4, 2020 at 12:24 PM Alan Krumholz <alan.krumh...@betterup.co>
>> wrote:
>>
>>> I'm using a managed notebook instance from GCP
>>> It seems those already come with cloudpickle==1.2.2 as soon as you
>>> provision it. apache-beam[gcp] will then install dill==0.3.1.1 I'm
>>> going to try to uninstall cloudpickle before installing apache-beam and see
>>> if this fixes the problem
>>>
>>> Thank you
>>>
>>> On Tue, Feb 4, 2020 at 11:54 AM Valentyn Tymofieiev <valen...@google.com>
>>> wrote:
>>>
>>>> The fact that you have cloudpickle==1.2.2 further confirms that you
>>>> may be hitting the same error as
>>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>>  .
>>>>
>>>> Could you try to start over with a clean virtual environment?
>>>>
>>>> On Tue, Feb 4, 2020 at 11:46 AM Alan Krumholz <
>>>> alan.krumh...@betterup.co> wrote:
>>>>
>>>>> Hi Valentyn,
>>>>>
>>>>> Here is my pip freeze on my machine (note that the error is in
>>>>> dataflow, the job runs fine in my machine)
>>>>>
>>>>> ansiwrap==0.8.4
>>>>> apache-beam==2.19.0
>>>>> arrow==0.15.5
>>>>> asn1crypto==1.3.0
>>>>> astroid==2.3.3
>>>>> astropy==3.2.3
>>>>> attrs==19.3.0
>>>>> avro-python3==1.9.1
>>>>> azure-common==1.1.24
>>>>> azure-storage-blob==2.1.0
>>>>> azure-storage-common==2.1.0
>>>>> backcall==0.1.0
>>>>> bcolz==1.2.1
>>>>> binaryornot==0.4.4
>>>>> bleach==3.1.0
>>>>> boto3==1.11.9
>>>>> botocore==1.14.9
>>>>> cachetools==3.1.1
>>>>> certifi==2019.11.28
>>>>> cffi==1.13.2
>>>>> chardet==3.0.4
>>>>> Click==7.0
>>>>> cloudpickle==1.2.2
>>>>> colorama==0.4.3
>>>>> configparser==4.0.2
>>>>> confuse==1.0.0
>>>>> cookiecutter==1.7.0
>>>>> crcmod==1.7
>>>>> cryptography==2.8
>>>>> cycler==0.10.0
>>>>> daal==2019.0
>>>>> datalab==1.1.5
>>>>> decorator==4.4.1
>>>>> defusedxml==0.6.0
>>>>> dill==0.3.1.1
>>>>> distro==1.0.1
>>>>> docker==4.1.0
>>>>> docopt==0.6.2
>>>>> docutils==0.15.2
>>>>> entrypoints==0.3
>>>>> enum34==1.1.6
>>>>> fairing==0.5.3
>>>>> fastavro==0.21.24
>>>>> fasteners==0.15
>>>>> fsspec==0.6.2
>>>>> future==0.18.2
>>>>> gcsfs==0.6.0
>>>>> gitdb2==2.0.6
>>>>> GitPython==3.0.5
>>>>> google-api-core==1.16.0
>>>>> google-api-python-client==1.7.11
>>>>> google-apitools==0.5.28
>>>>> google-auth==1.11.0
>>>>> google-auth-httplib2==0.0.3
>>>>> google-auth-oauthlib==0.4.1
>>>>> google-cloud-bigquery==1.17.1
>>>>> google-cloud-bigtable==1.0.0
>>>>> google-cloud-core==1.2.0
>>>>> google-cloud-dataproc==0.6.1
>>>>> google-cloud-datastore==1.7.4
>>>>> google-cloud-language==1.3.0
>>>>> google-cloud-logging==1.14.0
>>>>> google-cloud-monitoring==0.31.1
>>>>> google-cloud-pubsub==1.0.2
>>>>> google-cloud-secret-manager==0.1.1
>>>>> google-cloud-spanner==1.13.0
>>>>> google-cloud-storage==1.25.0
>>>>> google-cloud-translate==2.0.0
>>>>> google-compute-engine==20191210.0
>>>>> google-resumable-media==0.4.1
>>>>> googleapis-common-protos==1.51.0
>>>>> grpc-google-iam-v1==0.12.3
>>>>> grpcio==1.26.0
>>>>> h5py==2.10.0
>>>>> hdfs==2.5.8
>>>>> html5lib==1.0.1
>>>>> htmlmin==0.1.12
>>>>> httplib2==0.12.0
>>>>> icc-rt==2020.0.133
>>>>> idna==2.8
>>>>> ijson==2.6.1
>>>>> imageio==2.6.1
>>>>> importlib-metadata==1.4.0
>>>>> intel-numpy==1.15.1
>>>>> intel-openmp==2020.0.133
>>>>> intel-scikit-learn==0.19.2
>>>>> intel-scipy==1.1.0
>>>>> ipykernel==5.1.4
>>>>> ipython==7.9.0
>>>>> ipython-genutils==0.2.0
>>>>> ipython-sql==0.3.9
>>>>> ipywidgets==7.5.1
>>>>> isort==4.3.21
>>>>> jedi==0.16.0
>>>>> Jinja2==2.11.0
>>>>> jinja2-time==0.2.0
>>>>> jmespath==0.9.4
>>>>> joblib==0.14.1
>>>>> json5==0.8.5
>>>>> jsonschema==3.2.0
>>>>> jupyter==1.0.0
>>>>> jupyter-aihub-deploy-extension==0.1
>>>>> jupyter-client==5.3.4
>>>>> jupyter-console==6.1.0
>>>>> jupyter-contrib-core==0.3.3
>>>>> jupyter-contrib-nbextensions==0.5.1
>>>>> jupyter-core==4.6.1
>>>>> jupyter-highlight-selected-word==0.2.0
>>>>> jupyter-http-over-ws==0.0.7
>>>>> jupyter-latex-envs==1.4.6
>>>>> jupyter-nbextensions-configurator==0.4.1
>>>>> jupyterlab==1.2.6
>>>>> jupyterlab-git==0.9.0
>>>>> jupyterlab-server==1.0.6
>>>>> keyring==10.1
>>>>> keyrings.alt==1.3
>>>>> kiwisolver==1.1.0
>>>>> kubernetes==10.0.1
>>>>> lazy-object-proxy==1.4.3
>>>>> llvmlite==0.31.0
>>>>> lxml==4.4.2
>>>>> Markdown==3.1.1
>>>>> MarkupSafe==1.1.1
>>>>> matplotlib==3.0.3
>>>>> mccabe==0.6.1
>>>>> missingno==0.4.2
>>>>> mistune==0.8.4
>>>>> mkl==2019.0
>>>>> mkl-fft==1.0.6
>>>>> mkl-random==1.0.1.1
>>>>> mock==2.0.0
>>>>> monotonic==1.5
>>>>> more-itertools==8.1.0
>>>>> nbconvert==5.6.1
>>>>> nbdime==1.1.0
>>>>> nbformat==5.0.4
>>>>> networkx==2.4
>>>>> nltk==3.4.5
>>>>> notebook==6.0.3
>>>>> numba==0.47.0
>>>>> numpy==1.15.1
>>>>> oauth2client==3.0.0
>>>>> oauthlib==3.1.0
>>>>> opencv-python==4.1.2.30
>>>>> oscrypto==1.2.0
>>>>> packaging==20.1
>>>>> pandas==0.25.3
>>>>> pandas-profiling==1.4.0
>>>>> pandocfilters==1.4.2
>>>>> papermill==1.2.1
>>>>> parso==0.6.0
>>>>> pathlib2==2.3.5
>>>>> pbr==5.4.4
>>>>> pexpect==4.8.0
>>>>> phik==0.9.8
>>>>> pickleshare==0.7.5
>>>>> Pillow-SIMD==6.2.2.post1
>>>>> pipdeptree==0.13.2
>>>>> plotly==4.5.0
>>>>> pluggy==0.13.1
>>>>> poyo==0.5.0
>>>>> prettytable==0.7.2
>>>>> prometheus-client==0.7.1
>>>>> prompt-toolkit==2.0.10
>>>>> protobuf==3.11.2
>>>>> psutil==5.6.7
>>>>> ptyprocess==0.6.0
>>>>> py==1.8.1
>>>>> pyarrow==0.15.1
>>>>> pyasn1==0.4.8
>>>>> pyasn1-modules==0.2.8
>>>>> pycparser==2.19
>>>>> pycrypto==2.6.1
>>>>> pycryptodomex==3.9.6
>>>>> pycurl==7.43.0
>>>>> pydaal==2019.0.0.20180713
>>>>> pydot==1.4.1
>>>>> Pygments==2.5.2
>>>>> pygobject==3.22.0
>>>>> PyJWT==1.7.1
>>>>> pylint==2.4.4
>>>>> pymongo==3.10.1
>>>>> pyOpenSSL==19.1.0
>>>>> pyparsing==2.4.6
>>>>> pyrsistent==0.15.7
>>>>> pytest==5.3.4
>>>>> pytest-pylint==0.14.1
>>>>> python-apt==1.4.1
>>>>> python-dateutil==2.8.1
>>>>> pytz==2019.3
>>>>> PyWavelets==1.1.1
>>>>> pyxdg==0.25
>>>>> PyYAML==5.3
>>>>> pyzmq==18.1.1
>>>>> qtconsole==4.6.0
>>>>> requests==2.22.0
>>>>> requests-oauthlib==1.3.0
>>>>> retrying==1.3.3
>>>>> rsa==4.0
>>>>> s3transfer==0.3.2
>>>>> scikit-image==0.15.0
>>>>> scikit-learn==0.19.2
>>>>> scipy==1.1.0
>>>>> seaborn==0.9.1
>>>>> SecretStorage==2.3.1
>>>>> Send2Trash==1.5.0
>>>>> simplegeneric==0.8.1
>>>>> six==1.14.0
>>>>> smmap2==2.0.5
>>>>> snowflake-connector-python==2.2.0
>>>>> SQLAlchemy==1.3.13
>>>>> sqlparse==0.3.0
>>>>> tbb==2019.0
>>>>> tbb4py==2019.0
>>>>> tenacity==6.0.0
>>>>> terminado==0.8.3
>>>>> testpath==0.4.4
>>>>> textwrap3==0.9.2
>>>>> tornado==5.1.1
>>>>> tqdm==4.42.0
>>>>> traitlets==4.3.3
>>>>> typed-ast==1.4.1
>>>>> typing==3.7.4.1
>>>>> typing-extensions==3.7.4.1
>>>>> unattended-upgrades==0.1
>>>>> uritemplate==3.0.1
>>>>> urllib3==1.24.2
>>>>> virtualenv==16.7.9
>>>>> wcwidth==0.1.8
>>>>> webencodings==0.5.1
>>>>> websocket-client==0.57.0
>>>>> Werkzeug==0.16.1
>>>>> whichcraft==0.6.1
>>>>> widgetsnbextension==3.5.1
>>>>> wrapt==1.11.2
>>>>> zipp==1.1.0
>>>>>
>>>>>
>>>>> On Tue, Feb 4, 2020 at 11:33 AM Valentyn Tymofieiev <
>>>>> valen...@google.com> wrote:
>>>>>
>>>>>> It don't think there is a mismatch between dill versions here, but
>>>>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>>>>  mentions
>>>>>> a similar error and may be related. What is the output of pip freeze on
>>>>>> your machine (or better: pip install pipdeptree; pipdeptree)?
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 4, 2020 at 11:22 AM Alan Krumholz <
>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>
>>>>>>> Here is a test job that sometimes fails and sometimes doesn't (but
>>>>>>> most times do).....
>>>>>>> There seems to be something stochastic that causes this as after
>>>>>>> several tests a couple of them did succeed....
>>>>>>>
>>>>>>>
>>>>>>> def test_error(
>>>>>>>     bq_table: str) -> str:
>>>>>>>
>>>>>>>     import apache_beam as beam
>>>>>>>     from apache_beam.options.pipeline_options import PipelineOptions
>>>>>>>
>>>>>>>     class GenData(beam.DoFn):
>>>>>>>         def process(self, _):
>>>>>>>             for _ in range (20000):
>>>>>>>                 yield {'a':1,'b':2}
>>>>>>>
>>>>>>>
>>>>>>>     def get_bigquery_schema():
>>>>>>>         from apache_beam.io.gcp.internal.clients import bigquery
>>>>>>>
>>>>>>>         table_schema = bigquery.TableSchema()
>>>>>>>         columns = [
>>>>>>>             ["a","integer","nullable"],
>>>>>>>             ["b","integer","nullable"]
>>>>>>>         ]
>>>>>>>
>>>>>>>         for column in columns:
>>>>>>>             column_schema = bigquery.TableFieldSchema()
>>>>>>>             column_schema.name = column[0]
>>>>>>>             column_schema.type = column[1]
>>>>>>>             column_schema.mode = column[2]
>>>>>>>             table_schema.fields.append(column_schema)
>>>>>>>
>>>>>>>         return table_schema
>>>>>>>
>>>>>>>     pipeline = beam.Pipeline(options=PipelineOptions(
>>>>>>>         project='my-project',
>>>>>>>         temp_location = 'gs://my-bucket/temp',
>>>>>>>         staging_location = 'gs://my-bucket/staging',
>>>>>>>         runner='DataflowRunner'
>>>>>>>     ))
>>>>>>>     #pipeline = beam.Pipeline()
>>>>>>>
>>>>>>>     (
>>>>>>>         pipeline
>>>>>>>         | 'Empty start' >> beam.Create([''])
>>>>>>>         | 'Generate Data' >> beam.ParDo(GenData())
>>>>>>>         #| 'print' >> beam.Map(print)
>>>>>>>         | 'Write to BigQuery' >> beam.io.WriteToBigQuery(
>>>>>>>                     project=bq_table.split(':')[0],
>>>>>>>                     dataset=bq_table.split(':')[1].split('.')[0],
>>>>>>>                     table=bq_table.split(':')[1].split('.')[1],
>>>>>>>                     schema=get_bigquery_schema(),
>>>>>>>
>>>>>>> create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>>>>>>>
>>>>>>> write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
>>>>>>>     )
>>>>>>>
>>>>>>>     result = pipeline.run()
>>>>>>>     result.wait_until_finish()
>>>>>>>
>>>>>>>     return True
>>>>>>>
>>>>>>> test_error(
>>>>>>>     bq_table = 'my-project:my_dataset.my_table'
>>>>>>> )
>>>>>>>
>>>>>>> On Tue, Feb 4, 2020 at 10:04 AM Alan Krumholz <
>>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>>
>>>>>>>> I tried breaking apart my pipeline. Seems the step that breaks it
>>>>>>>> is:
>>>>>>>> beam.io.WriteToBigQuery
>>>>>>>>
>>>>>>>> Let me see if I can create a self contained example that breaks to
>>>>>>>> share with you
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> On Tue, Feb 4, 2020 at 9:53 AM Pablo Estrada <pabl...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hm that's odd. No changes to the pipeline? Are you able to share
>>>>>>>>> some of the code?
>>>>>>>>>
>>>>>>>>> +Udi Meiri <eh...@google.com> do you have any idea what could be
>>>>>>>>> going on here?
>>>>>>>>>
>>>>>>>>> On Tue, Feb 4, 2020 at 9:25 AM Alan Krumholz <
>>>>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Pablo,
>>>>>>>>>> This is strange... it doesn't seem to be the last beam release as
>>>>>>>>>> last night it was already using 2.19.0 I wonder if it was some 
>>>>>>>>>> release from
>>>>>>>>>> the DataFlow team (not beam related):
>>>>>>>>>> Job typeBatch
>>>>>>>>>> Job status Succeeded
>>>>>>>>>> SDK version
>>>>>>>>>> Apache Beam Python 3.5 SDK 2.19.0
>>>>>>>>>> Region
>>>>>>>>>> us-central1
>>>>>>>>>> Start timeFebruary 3, 2020 at 9:28:35 PM GMT-8
>>>>>>>>>> Elapsed time5 min 11 sec
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 4, 2020 at 9:15 AM Pablo Estrada <pabl...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Alan,
>>>>>>>>>>> could it be that you're picking up the new Apache Beam 2.19.0
>>>>>>>>>>> release? Could you try depending on beam 2.18.0 to see if the issue
>>>>>>>>>>> surfaces when using the new release?
>>>>>>>>>>>
>>>>>>>>>>> If something was working and no longer works, it sounds like a
>>>>>>>>>>> bug. This may have to do with how we pickle (dill / cloudpickle) - 
>>>>>>>>>>> see this
>>>>>>>>>>> question
>>>>>>>>>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>>>>>>>>> Best
>>>>>>>>>>> -P.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 4, 2020 at 6:22 AM Alan Krumholz <
>>>>>>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I was running a dataflow job in GCP last night and it was
>>>>>>>>>>>> running fine.
>>>>>>>>>>>> This morning this same exact job is failing with the following
>>>>>>>>>>>> error:
>>>>>>>>>>>>
>>>>>>>>>>>> Error message from worker: Traceback (most recent call last):
>>>>>>>>>>>> File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>>>>>>> line 286, in loads return dill.loads(s) File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, 
>>>>>>>>>>>> in loads
>>>>>>>>>>>> return load(file, ignore, **kwds) File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, 
>>>>>>>>>>>> in load
>>>>>>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, 
>>>>>>>>>>>> in load
>>>>>>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, 
>>>>>>>>>>>> in
>>>>>>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType' 
>>>>>>>>>>>> During
>>>>>>>>>>>> handling of the above exception, another exception occurred: 
>>>>>>>>>>>> Traceback
>>>>>>>>>>>> (most recent call last): File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py",
>>>>>>>>>>>> line 648, in do_work work_executor.execute() File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/executor.py",
>>>>>>>>>>>>  line
>>>>>>>>>>>> 176, in execute op.start() File 
>>>>>>>>>>>> "apache_beam/runners/worker/operations.py",
>>>>>>>>>>>> line 649, in 
>>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 651, in
>>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 652, in
>>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 261, in
>>>>>>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 266, in
>>>>>>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 597, in
>>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>>>>>>> "apache_beam/runners/worker/operations.py", line 602, in
>>>>>>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>>>>>>> line 290, in loads return dill.loads(s) File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, 
>>>>>>>>>>>> in loads
>>>>>>>>>>>> return load(file, ignore, **kwds) File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, 
>>>>>>>>>>>> in load
>>>>>>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, 
>>>>>>>>>>>> in load
>>>>>>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, 
>>>>>>>>>>>> in
>>>>>>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType'
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> If I use a local runner it still runs fine.
>>>>>>>>>>>> Anyone else experiencing something similar today? (or know how
>>>>>>>>>>>> to fix this?)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to