The fact that you have cloudpickle==1.2.2 further confirms that you may be hitting the same error as https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype .
Could you try to start over with a clean virtual environment? On Tue, Feb 4, 2020 at 11:46 AM Alan Krumholz <alan.krumh...@betterup.co> wrote: > Hi Valentyn, > > Here is my pip freeze on my machine (note that the error is in dataflow, > the job runs fine in my machine) > > ansiwrap==0.8.4 > apache-beam==2.19.0 > arrow==0.15.5 > asn1crypto==1.3.0 > astroid==2.3.3 > astropy==3.2.3 > attrs==19.3.0 > avro-python3==1.9.1 > azure-common==1.1.24 > azure-storage-blob==2.1.0 > azure-storage-common==2.1.0 > backcall==0.1.0 > bcolz==1.2.1 > binaryornot==0.4.4 > bleach==3.1.0 > boto3==1.11.9 > botocore==1.14.9 > cachetools==3.1.1 > certifi==2019.11.28 > cffi==1.13.2 > chardet==3.0.4 > Click==7.0 > cloudpickle==1.2.2 > colorama==0.4.3 > configparser==4.0.2 > confuse==1.0.0 > cookiecutter==1.7.0 > crcmod==1.7 > cryptography==2.8 > cycler==0.10.0 > daal==2019.0 > datalab==1.1.5 > decorator==4.4.1 > defusedxml==0.6.0 > dill==0.3.1.1 > distro==1.0.1 > docker==4.1.0 > docopt==0.6.2 > docutils==0.15.2 > entrypoints==0.3 > enum34==1.1.6 > fairing==0.5.3 > fastavro==0.21.24 > fasteners==0.15 > fsspec==0.6.2 > future==0.18.2 > gcsfs==0.6.0 > gitdb2==2.0.6 > GitPython==3.0.5 > google-api-core==1.16.0 > google-api-python-client==1.7.11 > google-apitools==0.5.28 > google-auth==1.11.0 > google-auth-httplib2==0.0.3 > google-auth-oauthlib==0.4.1 > google-cloud-bigquery==1.17.1 > google-cloud-bigtable==1.0.0 > google-cloud-core==1.2.0 > google-cloud-dataproc==0.6.1 > google-cloud-datastore==1.7.4 > google-cloud-language==1.3.0 > google-cloud-logging==1.14.0 > google-cloud-monitoring==0.31.1 > google-cloud-pubsub==1.0.2 > google-cloud-secret-manager==0.1.1 > google-cloud-spanner==1.13.0 > google-cloud-storage==1.25.0 > google-cloud-translate==2.0.0 > google-compute-engine==20191210.0 > google-resumable-media==0.4.1 > googleapis-common-protos==1.51.0 > grpc-google-iam-v1==0.12.3 > grpcio==1.26.0 > h5py==2.10.0 > hdfs==2.5.8 > html5lib==1.0.1 > htmlmin==0.1.12 > httplib2==0.12.0 > icc-rt==2020.0.133 > idna==2.8 > ijson==2.6.1 > imageio==2.6.1 > importlib-metadata==1.4.0 > intel-numpy==1.15.1 > intel-openmp==2020.0.133 > intel-scikit-learn==0.19.2 > intel-scipy==1.1.0 > ipykernel==5.1.4 > ipython==7.9.0 > ipython-genutils==0.2.0 > ipython-sql==0.3.9 > ipywidgets==7.5.1 > isort==4.3.21 > jedi==0.16.0 > Jinja2==2.11.0 > jinja2-time==0.2.0 > jmespath==0.9.4 > joblib==0.14.1 > json5==0.8.5 > jsonschema==3.2.0 > jupyter==1.0.0 > jupyter-aihub-deploy-extension==0.1 > jupyter-client==5.3.4 > jupyter-console==6.1.0 > jupyter-contrib-core==0.3.3 > jupyter-contrib-nbextensions==0.5.1 > jupyter-core==4.6.1 > jupyter-highlight-selected-word==0.2.0 > jupyter-http-over-ws==0.0.7 > jupyter-latex-envs==1.4.6 > jupyter-nbextensions-configurator==0.4.1 > jupyterlab==1.2.6 > jupyterlab-git==0.9.0 > jupyterlab-server==1.0.6 > keyring==10.1 > keyrings.alt==1.3 > kiwisolver==1.1.0 > kubernetes==10.0.1 > lazy-object-proxy==1.4.3 > llvmlite==0.31.0 > lxml==4.4.2 > Markdown==3.1.1 > MarkupSafe==1.1.1 > matplotlib==3.0.3 > mccabe==0.6.1 > missingno==0.4.2 > mistune==0.8.4 > mkl==2019.0 > mkl-fft==1.0.6 > mkl-random==1.0.1.1 > mock==2.0.0 > monotonic==1.5 > more-itertools==8.1.0 > nbconvert==5.6.1 > nbdime==1.1.0 > nbformat==5.0.4 > networkx==2.4 > nltk==3.4.5 > notebook==6.0.3 > numba==0.47.0 > numpy==1.15.1 > oauth2client==3.0.0 > oauthlib==3.1.0 > opencv-python==4.1.2.30 > oscrypto==1.2.0 > packaging==20.1 > pandas==0.25.3 > pandas-profiling==1.4.0 > pandocfilters==1.4.2 > papermill==1.2.1 > parso==0.6.0 > pathlib2==2.3.5 > pbr==5.4.4 > pexpect==4.8.0 > phik==0.9.8 > pickleshare==0.7.5 > Pillow-SIMD==6.2.2.post1 > pipdeptree==0.13.2 > plotly==4.5.0 > pluggy==0.13.1 > poyo==0.5.0 > prettytable==0.7.2 > prometheus-client==0.7.1 > prompt-toolkit==2.0.10 > protobuf==3.11.2 > psutil==5.6.7 > ptyprocess==0.6.0 > py==1.8.1 > pyarrow==0.15.1 > pyasn1==0.4.8 > pyasn1-modules==0.2.8 > pycparser==2.19 > pycrypto==2.6.1 > pycryptodomex==3.9.6 > pycurl==7.43.0 > pydaal==2019.0.0.20180713 > pydot==1.4.1 > Pygments==2.5.2 > pygobject==3.22.0 > PyJWT==1.7.1 > pylint==2.4.4 > pymongo==3.10.1 > pyOpenSSL==19.1.0 > pyparsing==2.4.6 > pyrsistent==0.15.7 > pytest==5.3.4 > pytest-pylint==0.14.1 > python-apt==1.4.1 > python-dateutil==2.8.1 > pytz==2019.3 > PyWavelets==1.1.1 > pyxdg==0.25 > PyYAML==5.3 > pyzmq==18.1.1 > qtconsole==4.6.0 > requests==2.22.0 > requests-oauthlib==1.3.0 > retrying==1.3.3 > rsa==4.0 > s3transfer==0.3.2 > scikit-image==0.15.0 > scikit-learn==0.19.2 > scipy==1.1.0 > seaborn==0.9.1 > SecretStorage==2.3.1 > Send2Trash==1.5.0 > simplegeneric==0.8.1 > six==1.14.0 > smmap2==2.0.5 > snowflake-connector-python==2.2.0 > SQLAlchemy==1.3.13 > sqlparse==0.3.0 > tbb==2019.0 > tbb4py==2019.0 > tenacity==6.0.0 > terminado==0.8.3 > testpath==0.4.4 > textwrap3==0.9.2 > tornado==5.1.1 > tqdm==4.42.0 > traitlets==4.3.3 > typed-ast==1.4.1 > typing==3.7.4.1 > typing-extensions==3.7.4.1 > unattended-upgrades==0.1 > uritemplate==3.0.1 > urllib3==1.24.2 > virtualenv==16.7.9 > wcwidth==0.1.8 > webencodings==0.5.1 > websocket-client==0.57.0 > Werkzeug==0.16.1 > whichcraft==0.6.1 > widgetsnbextension==3.5.1 > wrapt==1.11.2 > zipp==1.1.0 > > > On Tue, Feb 4, 2020 at 11:33 AM Valentyn Tymofieiev <valen...@google.com> > wrote: > >> It don't think there is a mismatch between dill versions here, but >> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype >> mentions >> a similar error and may be related. What is the output of pip freeze on >> your machine (or better: pip install pipdeptree; pipdeptree)? >> >> >> On Tue, Feb 4, 2020 at 11:22 AM Alan Krumholz <alan.krumh...@betterup.co> >> wrote: >> >>> Here is a test job that sometimes fails and sometimes doesn't (but most >>> times do)..... >>> There seems to be something stochastic that causes this as after several >>> tests a couple of them did succeed.... >>> >>> >>> def test_error( >>> bq_table: str) -> str: >>> >>> import apache_beam as beam >>> from apache_beam.options.pipeline_options import PipelineOptions >>> >>> class GenData(beam.DoFn): >>> def process(self, _): >>> for _ in range (20000): >>> yield {'a':1,'b':2} >>> >>> >>> def get_bigquery_schema(): >>> from apache_beam.io.gcp.internal.clients import bigquery >>> >>> table_schema = bigquery.TableSchema() >>> columns = [ >>> ["a","integer","nullable"], >>> ["b","integer","nullable"] >>> ] >>> >>> for column in columns: >>> column_schema = bigquery.TableFieldSchema() >>> column_schema.name = column[0] >>> column_schema.type = column[1] >>> column_schema.mode = column[2] >>> table_schema.fields.append(column_schema) >>> >>> return table_schema >>> >>> pipeline = beam.Pipeline(options=PipelineOptions( >>> project='my-project', >>> temp_location = 'gs://my-bucket/temp', >>> staging_location = 'gs://my-bucket/staging', >>> runner='DataflowRunner' >>> )) >>> #pipeline = beam.Pipeline() >>> >>> ( >>> pipeline >>> | 'Empty start' >> beam.Create(['']) >>> | 'Generate Data' >> beam.ParDo(GenData()) >>> #| 'print' >> beam.Map(print) >>> | 'Write to BigQuery' >> beam.io.WriteToBigQuery( >>> project=bq_table.split(':')[0], >>> dataset=bq_table.split(':')[1].split('.')[0], >>> table=bq_table.split(':')[1].split('.')[1], >>> schema=get_bigquery_schema(), >>> >>> create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED, >>> >>> write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE) >>> ) >>> >>> result = pipeline.run() >>> result.wait_until_finish() >>> >>> return True >>> >>> test_error( >>> bq_table = 'my-project:my_dataset.my_table' >>> ) >>> >>> On Tue, Feb 4, 2020 at 10:04 AM Alan Krumholz <alan.krumh...@betterup.co> >>> wrote: >>> >>>> I tried breaking apart my pipeline. Seems the step that breaks it is: >>>> beam.io.WriteToBigQuery >>>> >>>> Let me see if I can create a self contained example that breaks to >>>> share with you >>>> >>>> Thanks! >>>> >>>> On Tue, Feb 4, 2020 at 9:53 AM Pablo Estrada <pabl...@google.com> >>>> wrote: >>>> >>>>> Hm that's odd. No changes to the pipeline? Are you able to share some >>>>> of the code? >>>>> >>>>> +Udi Meiri <eh...@google.com> do you have any idea what could be >>>>> going on here? >>>>> >>>>> On Tue, Feb 4, 2020 at 9:25 AM Alan Krumholz < >>>>> alan.krumh...@betterup.co> wrote: >>>>> >>>>>> Hi Pablo, >>>>>> This is strange... it doesn't seem to be the last beam release as >>>>>> last night it was already using 2.19.0 I wonder if it was some release >>>>>> from >>>>>> the DataFlow team (not beam related): >>>>>> Job typeBatch >>>>>> Job status Succeeded >>>>>> SDK version >>>>>> Apache Beam Python 3.5 SDK 2.19.0 >>>>>> Region >>>>>> us-central1 >>>>>> Start timeFebruary 3, 2020 at 9:28:35 PM GMT-8 >>>>>> Elapsed time5 min 11 sec >>>>>> >>>>>> On Tue, Feb 4, 2020 at 9:15 AM Pablo Estrada <pabl...@google.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Alan, >>>>>>> could it be that you're picking up the new Apache Beam 2.19.0 >>>>>>> release? Could you try depending on beam 2.18.0 to see if the issue >>>>>>> surfaces when using the new release? >>>>>>> >>>>>>> If something was working and no longer works, it sounds like a bug. >>>>>>> This may have to do with how we pickle (dill / cloudpickle) - see this >>>>>>> question >>>>>>> https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype >>>>>>> Best >>>>>>> -P. >>>>>>> >>>>>>> On Tue, Feb 4, 2020 at 6:22 AM Alan Krumholz < >>>>>>> alan.krumh...@betterup.co> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I was running a dataflow job in GCP last night and it was running >>>>>>>> fine. >>>>>>>> This morning this same exact job is failing with the following >>>>>>>> error: >>>>>>>> >>>>>>>> Error message from worker: Traceback (most recent call last): File >>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", >>>>>>>> line 286, in loads return dill.loads(s) File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in >>>>>>>> loads >>>>>>>> return load(file, ignore, **kwds) File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in >>>>>>>> load >>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in >>>>>>>> load >>>>>>>> obj = StockUnpickler.load(self) File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in >>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType' During >>>>>>>> handling of the above exception, another exception occurred: Traceback >>>>>>>> (most recent call last): File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", >>>>>>>> line 648, in do_work work_executor.execute() File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dataflow_worker/executor.py", >>>>>>>> line >>>>>>>> 176, in execute op.start() File >>>>>>>> "apache_beam/runners/worker/operations.py", >>>>>>>> line 649, in apache_beam.runners.worker.operations.DoOperation.start >>>>>>>> File >>>>>>>> "apache_beam/runners/worker/operations.py", line 651, in >>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File >>>>>>>> "apache_beam/runners/worker/operations.py", line 652, in >>>>>>>> apache_beam.runners.worker.operations.DoOperation.start File >>>>>>>> "apache_beam/runners/worker/operations.py", line 261, in >>>>>>>> apache_beam.runners.worker.operations.Operation.start File >>>>>>>> "apache_beam/runners/worker/operations.py", line 266, in >>>>>>>> apache_beam.runners.worker.operations.Operation.start File >>>>>>>> "apache_beam/runners/worker/operations.py", line 597, in >>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File >>>>>>>> "apache_beam/runners/worker/operations.py", line 602, in >>>>>>>> apache_beam.runners.worker.operations.DoOperation.setup File >>>>>>>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", >>>>>>>> line 290, in loads return dill.loads(s) File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in >>>>>>>> loads >>>>>>>> return load(file, ignore, **kwds) File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in >>>>>>>> load >>>>>>>> return Unpickler(file, ignore=ignore, **kwds).load() File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in >>>>>>>> load >>>>>>>> obj = StockUnpickler.load(self) File >>>>>>>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in >>>>>>>> _load_type return _reverse_typemap[name] KeyError: 'ClassType' >>>>>>>> >>>>>>>> >>>>>>>> If I use a local runner it still runs fine. >>>>>>>> Anyone else experiencing something similar today? (or know how to >>>>>>>> fix this?) >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>