On Mon, Jan 13, 2020 at 12:45 PM Bo Shi <[email protected]> wrote: > Python 3.7.5 > Beam 2.17 > > I've used both WriteToBigquery and BigQueryBatchFileLoads successfully > using DirectRunner. I've boiled the issue down to a small > reproducible case (attached). The following works great: > > $ pipenv run python repro-direct.py \ > --project=CHANGEME \ > --runner=DirectRunner \ > --temp_location=gs://beam-to-bq--tmp/bshi/tmp \ > --staging_location=gs://beam-to-bq--tmp/bshi/stg \ > --experiment=beam_fn_api \ # <<<< NB! (I assume this is a no-op w/ > Direct > --save_main_function > > $ bq --project=CHANGEME query 'select * from bshi_test.direct' > Waiting on bqjob_r53488a6b9ed6cb20_0000016fa05fb570_1 ... (0s) Current > status: DONE > +----------+----------+ > | some_str | some_int | > +----------+----------+ > | hello | 1 | > | world | 2 | > +----------+----------+ > > When changing to --runner=DataflowRunner, I get a cryptic Java error > with no other useful Python feedback (see class_cast_exception.txt) > > apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: > Dataflow pipeline failed. State: FAILED, Error: > java.util.concurrent.ExecutionException: java.lang.ClassCastException: > [B cannot be cast to org.apache.beam.sdk.values.KV > > On a hunch, I removed --experiment=beam_fn_api and DataflowRunner > works fine. Am I doing something against the intent of the SDK? >
Can you try adding "--experiment=use_beam_bq_sink" ? Also, out of curiosity, why are you setting "--experiment=beam_fn_api" ? > Happy to share access to the GCP project if that would assist in debugging. >
