The fact that you have cloudpickle==1.2.2 further confirms that you may be
hitting the same error as
https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
 .

Could you try to start over with a clean virtual environment?

On Tue, Feb 4, 2020 at 11:46 AM Alan Krumholz <alan.krumh...@betterup.co>
wrote:

> Hi Valentyn,
>
> Here is my pip freeze on my machine (note that the error is in dataflow,
> the job runs fine in my machine)
>
> ansiwrap==0.8.4
> apache-beam==2.19.0
> arrow==0.15.5
> asn1crypto==1.3.0
> astroid==2.3.3
> astropy==3.2.3
> attrs==19.3.0
> avro-python3==1.9.1
> azure-common==1.1.24
> azure-storage-blob==2.1.0
> azure-storage-common==2.1.0
> backcall==0.1.0
> bcolz==1.2.1
> binaryornot==0.4.4
> bleach==3.1.0
> boto3==1.11.9
> botocore==1.14.9
> cachetools==3.1.1
> certifi==2019.11.28
> cffi==1.13.2
> chardet==3.0.4
> Click==7.0
> cloudpickle==1.2.2
> colorama==0.4.3
> configparser==4.0.2
> confuse==1.0.0
> cookiecutter==1.7.0
> crcmod==1.7
> cryptography==2.8
> cycler==0.10.0
> daal==2019.0
> datalab==1.1.5
> decorator==4.4.1
> defusedxml==0.6.0
> dill==0.3.1.1
> distro==1.0.1
> docker==4.1.0
> docopt==0.6.2
> docutils==0.15.2
> entrypoints==0.3
> enum34==1.1.6
> fairing==0.5.3
> fastavro==0.21.24
> fasteners==0.15
> fsspec==0.6.2
> future==0.18.2
> gcsfs==0.6.0
> gitdb2==2.0.6
> GitPython==3.0.5
> google-api-core==1.16.0
> google-api-python-client==1.7.11
> google-apitools==0.5.28
> google-auth==1.11.0
> google-auth-httplib2==0.0.3
> google-auth-oauthlib==0.4.1
> google-cloud-bigquery==1.17.1
> google-cloud-bigtable==1.0.0
> google-cloud-core==1.2.0
> google-cloud-dataproc==0.6.1
> google-cloud-datastore==1.7.4
> google-cloud-language==1.3.0
> google-cloud-logging==1.14.0
> google-cloud-monitoring==0.31.1
> google-cloud-pubsub==1.0.2
> google-cloud-secret-manager==0.1.1
> google-cloud-spanner==1.13.0
> google-cloud-storage==1.25.0
> google-cloud-translate==2.0.0
> google-compute-engine==20191210.0
> google-resumable-media==0.4.1
> googleapis-common-protos==1.51.0
> grpc-google-iam-v1==0.12.3
> grpcio==1.26.0
> h5py==2.10.0
> hdfs==2.5.8
> html5lib==1.0.1
> htmlmin==0.1.12
> httplib2==0.12.0
> icc-rt==2020.0.133
> idna==2.8
> ijson==2.6.1
> imageio==2.6.1
> importlib-metadata==1.4.0
> intel-numpy==1.15.1
> intel-openmp==2020.0.133
> intel-scikit-learn==0.19.2
> intel-scipy==1.1.0
> ipykernel==5.1.4
> ipython==7.9.0
> ipython-genutils==0.2.0
> ipython-sql==0.3.9
> ipywidgets==7.5.1
> isort==4.3.21
> jedi==0.16.0
> Jinja2==2.11.0
> jinja2-time==0.2.0
> jmespath==0.9.4
> joblib==0.14.1
> json5==0.8.5
> jsonschema==3.2.0
> jupyter==1.0.0
> jupyter-aihub-deploy-extension==0.1
> jupyter-client==5.3.4
> jupyter-console==6.1.0
> jupyter-contrib-core==0.3.3
> jupyter-contrib-nbextensions==0.5.1
> jupyter-core==4.6.1
> jupyter-highlight-selected-word==0.2.0
> jupyter-http-over-ws==0.0.7
> jupyter-latex-envs==1.4.6
> jupyter-nbextensions-configurator==0.4.1
> jupyterlab==1.2.6
> jupyterlab-git==0.9.0
> jupyterlab-server==1.0.6
> keyring==10.1
> keyrings.alt==1.3
> kiwisolver==1.1.0
> kubernetes==10.0.1
> lazy-object-proxy==1.4.3
> llvmlite==0.31.0
> lxml==4.4.2
> Markdown==3.1.1
> MarkupSafe==1.1.1
> matplotlib==3.0.3
> mccabe==0.6.1
> missingno==0.4.2
> mistune==0.8.4
> mkl==2019.0
> mkl-fft==1.0.6
> mkl-random==1.0.1.1
> mock==2.0.0
> monotonic==1.5
> more-itertools==8.1.0
> nbconvert==5.6.1
> nbdime==1.1.0
> nbformat==5.0.4
> networkx==2.4
> nltk==3.4.5
> notebook==6.0.3
> numba==0.47.0
> numpy==1.15.1
> oauth2client==3.0.0
> oauthlib==3.1.0
> opencv-python==4.1.2.30
> oscrypto==1.2.0
> packaging==20.1
> pandas==0.25.3
> pandas-profiling==1.4.0
> pandocfilters==1.4.2
> papermill==1.2.1
> parso==0.6.0
> pathlib2==2.3.5
> pbr==5.4.4
> pexpect==4.8.0
> phik==0.9.8
> pickleshare==0.7.5
> Pillow-SIMD==6.2.2.post1
> pipdeptree==0.13.2
> plotly==4.5.0
> pluggy==0.13.1
> poyo==0.5.0
> prettytable==0.7.2
> prometheus-client==0.7.1
> prompt-toolkit==2.0.10
> protobuf==3.11.2
> psutil==5.6.7
> ptyprocess==0.6.0
> py==1.8.1
> pyarrow==0.15.1
> pyasn1==0.4.8
> pyasn1-modules==0.2.8
> pycparser==2.19
> pycrypto==2.6.1
> pycryptodomex==3.9.6
> pycurl==7.43.0
> pydaal==2019.0.0.20180713
> pydot==1.4.1
> Pygments==2.5.2
> pygobject==3.22.0
> PyJWT==1.7.1
> pylint==2.4.4
> pymongo==3.10.1
> pyOpenSSL==19.1.0
> pyparsing==2.4.6
> pyrsistent==0.15.7
> pytest==5.3.4
> pytest-pylint==0.14.1
> python-apt==1.4.1
> python-dateutil==2.8.1
> pytz==2019.3
> PyWavelets==1.1.1
> pyxdg==0.25
> PyYAML==5.3
> pyzmq==18.1.1
> qtconsole==4.6.0
> requests==2.22.0
> requests-oauthlib==1.3.0
> retrying==1.3.3
> rsa==4.0
> s3transfer==0.3.2
> scikit-image==0.15.0
> scikit-learn==0.19.2
> scipy==1.1.0
> seaborn==0.9.1
> SecretStorage==2.3.1
> Send2Trash==1.5.0
> simplegeneric==0.8.1
> six==1.14.0
> smmap2==2.0.5
> snowflake-connector-python==2.2.0
> SQLAlchemy==1.3.13
> sqlparse==0.3.0
> tbb==2019.0
> tbb4py==2019.0
> tenacity==6.0.0
> terminado==0.8.3
> testpath==0.4.4
> textwrap3==0.9.2
> tornado==5.1.1
> tqdm==4.42.0
> traitlets==4.3.3
> typed-ast==1.4.1
> typing==3.7.4.1
> typing-extensions==3.7.4.1
> unattended-upgrades==0.1
> uritemplate==3.0.1
> urllib3==1.24.2
> virtualenv==16.7.9
> wcwidth==0.1.8
> webencodings==0.5.1
> websocket-client==0.57.0
> Werkzeug==0.16.1
> whichcraft==0.6.1
> widgetsnbextension==3.5.1
> wrapt==1.11.2
> zipp==1.1.0
>
>
> On Tue, Feb 4, 2020 at 11:33 AM Valentyn Tymofieiev <valen...@google.com>
> wrote:
>
>> It don't think there is a mismatch between dill versions here, but
>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>  mentions
>> a similar error and may be related. What is the output of pip freeze on
>> your machine (or better: pip install pipdeptree; pipdeptree)?
>>
>>
>> On Tue, Feb 4, 2020 at 11:22 AM Alan Krumholz <alan.krumh...@betterup.co>
>> wrote:
>>
>>> Here is a test job that sometimes fails and sometimes doesn't (but most
>>> times do).....
>>> There seems to be something stochastic that causes this as after several
>>> tests a couple of them did succeed....
>>>
>>>
>>> def test_error(
>>>     bq_table: str) -> str:
>>>
>>>     import apache_beam as beam
>>>     from apache_beam.options.pipeline_options import PipelineOptions
>>>
>>>     class GenData(beam.DoFn):
>>>         def process(self, _):
>>>             for _ in range (20000):
>>>                 yield {'a':1,'b':2}
>>>
>>>
>>>     def get_bigquery_schema():
>>>         from apache_beam.io.gcp.internal.clients import bigquery
>>>
>>>         table_schema = bigquery.TableSchema()
>>>         columns = [
>>>             ["a","integer","nullable"],
>>>             ["b","integer","nullable"]
>>>         ]
>>>
>>>         for column in columns:
>>>             column_schema = bigquery.TableFieldSchema()
>>>             column_schema.name = column[0]
>>>             column_schema.type = column[1]
>>>             column_schema.mode = column[2]
>>>             table_schema.fields.append(column_schema)
>>>
>>>         return table_schema
>>>
>>>     pipeline = beam.Pipeline(options=PipelineOptions(
>>>         project='my-project',
>>>         temp_location = 'gs://my-bucket/temp',
>>>         staging_location = 'gs://my-bucket/staging',
>>>         runner='DataflowRunner'
>>>     ))
>>>     #pipeline = beam.Pipeline()
>>>
>>>     (
>>>         pipeline
>>>         | 'Empty start' >> beam.Create([''])
>>>         | 'Generate Data' >> beam.ParDo(GenData())
>>>         #| 'print' >> beam.Map(print)
>>>         | 'Write to BigQuery' >> beam.io.WriteToBigQuery(
>>>                     project=bq_table.split(':')[0],
>>>                     dataset=bq_table.split(':')[1].split('.')[0],
>>>                     table=bq_table.split(':')[1].split('.')[1],
>>>                     schema=get_bigquery_schema(),
>>>
>>> create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>>>
>>> write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
>>>     )
>>>
>>>     result = pipeline.run()
>>>     result.wait_until_finish()
>>>
>>>     return True
>>>
>>> test_error(
>>>     bq_table = 'my-project:my_dataset.my_table'
>>> )
>>>
>>> On Tue, Feb 4, 2020 at 10:04 AM Alan Krumholz <alan.krumh...@betterup.co>
>>> wrote:
>>>
>>>> I tried breaking apart my pipeline. Seems the step that breaks it is:
>>>> beam.io.WriteToBigQuery
>>>>
>>>> Let me see if I can create a self contained example that breaks to
>>>> share with you
>>>>
>>>> Thanks!
>>>>
>>>> On Tue, Feb 4, 2020 at 9:53 AM Pablo Estrada <pabl...@google.com>
>>>> wrote:
>>>>
>>>>> Hm that's odd. No changes to the pipeline? Are you able to share some
>>>>> of the code?
>>>>>
>>>>> +Udi Meiri <eh...@google.com> do you have any idea what could be
>>>>> going on here?
>>>>>
>>>>> On Tue, Feb 4, 2020 at 9:25 AM Alan Krumholz <
>>>>> alan.krumh...@betterup.co> wrote:
>>>>>
>>>>>> Hi Pablo,
>>>>>> This is strange... it doesn't seem to be the last beam release as
>>>>>> last night it was already using 2.19.0 I wonder if it was some release 
>>>>>> from
>>>>>> the DataFlow team (not beam related):
>>>>>> Job typeBatch
>>>>>> Job status Succeeded
>>>>>> SDK version
>>>>>> Apache Beam Python 3.5 SDK 2.19.0
>>>>>> Region
>>>>>> us-central1
>>>>>> Start timeFebruary 3, 2020 at 9:28:35 PM GMT-8
>>>>>> Elapsed time5 min 11 sec
>>>>>>
>>>>>> On Tue, Feb 4, 2020 at 9:15 AM Pablo Estrada <pabl...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Alan,
>>>>>>> could it be that you're picking up the new Apache Beam 2.19.0
>>>>>>> release? Could you try depending on beam 2.18.0 to see if the issue
>>>>>>> surfaces when using the new release?
>>>>>>>
>>>>>>> If something was working and no longer works, it sounds like a bug.
>>>>>>> This may have to do with how we pickle (dill / cloudpickle) - see this
>>>>>>> question
>>>>>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
>>>>>>> Best
>>>>>>> -P.
>>>>>>>
>>>>>>> On Tue, Feb 4, 2020 at 6:22 AM Alan Krumholz <
>>>>>>> alan.krumh...@betterup.co> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I was running a dataflow job in GCP last night and it was running
>>>>>>>> fine.
>>>>>>>> This morning this same exact job is failing with the following
>>>>>>>> error:
>>>>>>>>
>>>>>>>> Error message from worker: Traceback (most recent call last): File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>>> line 286, in loads return dill.loads(s) File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in 
>>>>>>>> loads
>>>>>>>> return load(file, ignore, **kwds) File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in 
>>>>>>>> load
>>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in 
>>>>>>>> load
>>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
>>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType' During
>>>>>>>> handling of the above exception, another exception occurred: Traceback
>>>>>>>> (most recent call last): File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py",
>>>>>>>> line 648, in do_work work_executor.execute() File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/executor.py", 
>>>>>>>> line
>>>>>>>> 176, in execute op.start() File 
>>>>>>>> "apache_beam/runners/worker/operations.py",
>>>>>>>> line 649, in apache_beam.runners.worker.operations.DoOperation.start 
>>>>>>>> File
>>>>>>>> "apache_beam/runners/worker/operations.py", line 651, in
>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>> "apache_beam/runners/worker/operations.py", line 652, in
>>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File
>>>>>>>> "apache_beam/runners/worker/operations.py", line 261, in
>>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>>> "apache_beam/runners/worker/operations.py", line 266, in
>>>>>>>> apache_beam.runners.worker.operations.Operation.start File
>>>>>>>> "apache_beam/runners/worker/operations.py", line 597, in
>>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>>> "apache_beam/runners/worker/operations.py", line 602, in
>>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>>>>>>>> line 290, in loads return dill.loads(s) File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in 
>>>>>>>> loads
>>>>>>>> return load(file, ignore, **kwds) File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in 
>>>>>>>> load
>>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in 
>>>>>>>> load
>>>>>>>> obj = StockUnpickler.load(self) File
>>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
>>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType'
>>>>>>>>
>>>>>>>>
>>>>>>>> If I use a local runner it still runs fine.
>>>>>>>> Anyone else experiencing something similar today? (or know how to
>>>>>>>> fix this?)
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>

Reply via email to