Re: A problem with calcite sql

2021-05-10 Thread Andrew Pilloud
For the first one you have https://issues.apache.org/jira/browse/BEAM-5251 For the second, I opened a new issue for you: https://issues.apache.org/jira/browse/BEAM-12323 Your second issue is because our Avro conversion library doesn't know how to handle fixed length strings. These normally show

Re: A problem with calcite sql

2021-05-10 Thread Tao Li
Sorry to bug with another question. I was saving a data set with below schema (this dataset comes from sql query). Saw the SqlCharType issue. Did anyone see this issue before? [main] INFO com.zillow.pipeler.core.transform.DatasetFlattenerCore - Fields: Field{name=id, description=,

Re: A problem with calcite sql

2021-05-10 Thread Tao Li
Never mind. Looks like “user” is a reserved name. From: Tao Li Reply-To: "user@beam.apache.org" Date: Monday, May 10, 2021 at 7:10 PM To: "user@beam.apache.org" Cc: Yuan Feng Subject: A problem with calcite sql Hi Beam community, I am seeing a weird issue by using calcite sql. I don’t

A problem with calcite sql

2021-05-10 Thread Tao Li
Hi Beam community, I am seeing a weird issue by using calcite sql. I don’t understand why it’s complaining my query is not valid. Once I removed “user AS user”, it worked fine. Please advise. Thanks. Exception in thread "main" org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to

Re: [EXT] Re: [EXT] Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-05-10 Thread Wenbing Bai
Hi Robert and Brian, I don't know why I didn't catch your replies. But thank you so much for looking at this. My parquet files will be consumed by downstreaming processes which require data points with the same "key1" that are sorted by "key2". The downstreaming process, for example, will make a

Re: Python SDK Kafka connector issues

2021-05-10 Thread Wouter Zorgdrager
Hi Boyuan, I understand. I basically have to re-open the Kafka connection for every 'bundle', also since my Kafka consumer is not serializable. It indeed might be a bit inefficient, but this is mostly for testing purposes so I guess it's ok for now. I'm using a FlinkRunner and unfortunately,

Re: Extremely Slow DirectRunner

2021-05-10 Thread Boyuan Zhang
Hi Evan, What do you mean startup delay? Is it the time that from you start the pipeline to the time that you notice the first output record from PubSub? On Sat, May 8, 2021 at 12:50 AM Ismaël Mejía wrote: > Can you try running direct runner with the option >

Re: Python SDK Kafka connector issues

2021-05-10 Thread Boyuan Zhang
Hi Wouter, So in short my flow is like this: > - I have a stateful DoFn, which reads from a PCollection with the > configuration of my KafkaConsumer. > - The stateful DoFn reads this message from the PCollection and starts a > KafkaConsumer (in the process method) > - Rather than yielding message

Re: Python SDK Kafka connector issues

2021-05-10 Thread Wouter Zorgdrager
Hi Boyuan, Thanks for your suggestion, that sounds like a good idea. However, where do I open my connection with Kafka and start consuming messages? Is it in the process method (turning it into a blocking method?). I currently try it like this, but timers never seem to be fired. I have the