Filed https://issues.apache.org/jira/browse/BEAM-9192 per offline convo w/ Chamikara's.
On Mon, Jan 13, 2020 at 10:47 PM Bo Shi <[email protected]> wrote: > > Hi Chamikara, > > I've tried both with and without "--experiment=use_beam_bq_sink" and > it doesn't seem to affect the outcome. > > Regarding "--experiment=beam_fn_api", we're using the portability > framework [1] so that we can ship our python runtime environment as an > image via "--worker_harness_container_image" instead of maintaining a > setup.py file. It's worked very well for us so far (with the > exception of this latest issue). > > > [1] Am I using the term right? > https://beam.apache.org/documentation/runtime/environments/ > > On Mon, Jan 13, 2020 at 4:40 PM Chamikara Jayalath <[email protected]> > wrote: > > > > > > > > On Mon, Jan 13, 2020 at 12:45 PM Bo Shi <[email protected]> wrote: > >> > >> Python 3.7.5 > >> Beam 2.17 > >> > >> I've used both WriteToBigquery and BigQueryBatchFileLoads successfully > >> using DirectRunner. I've boiled the issue down to a small > >> reproducible case (attached). The following works great: > >> > >> $ pipenv run python repro-direct.py \ > >> --project=CHANGEME \ > >> --runner=DirectRunner \ > >> --temp_location=gs://beam-to-bq--tmp/bshi/tmp \ > >> --staging_location=gs://beam-to-bq--tmp/bshi/stg \ > >> --experiment=beam_fn_api \ # <<<< NB! (I assume this is a no-op w/ > >> Direct > >> --save_main_function > >> > >> $ bq --project=CHANGEME query 'select * from bshi_test.direct' > >> Waiting on bqjob_r53488a6b9ed6cb20_0000016fa05fb570_1 ... (0s) Current > >> status: DONE > >> +----------+----------+ > >> | some_str | some_int | > >> +----------+----------+ > >> | hello | 1 | > >> | world | 2 | > >> +----------+----------+ > >> > >> When changing to --runner=DataflowRunner, I get a cryptic Java error > >> with no other useful Python feedback (see class_cast_exception.txt) > >> > >> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: > >> Dataflow pipeline failed. State: FAILED, Error: > >> java.util.concurrent.ExecutionException: java.lang.ClassCastException: > >> [B cannot be cast to org.apache.beam.sdk.values.KV > >> > >> On a hunch, I removed --experiment=beam_fn_api and DataflowRunner > >> works fine. Am I doing something against the intent of the SDK? > > > > > > Can you try adding "--experiment=use_beam_bq_sink" ? > > Also, out of curiosity, why are you setting "--experiment=beam_fn_api" ? > > > > > >> > >> Happy to share access to the GCP project if that would assist in debugging.
