[Question] Is it possible to have ParquetIO ignore column order?

2021-07-22 Thread Joseph Kesting
I am trying to read a set of Parquet files from GCS and it is failing because the order of the Parquet columns does not match the order of fields defined by the SchemaBuilder. Ex: I am defining my Schema as the following: public static Schema buildSchema() { SchemaBuilder.FieldAssembler builde

Re: Using Beam to generate unique ids with unbounded sources

2021-07-22 Thread Jan Lukavský
Hi Cristian, I didn't try that, so I'm not 100% sure it would work, but you probably could try using custom timestamp policy for the KafkaIO, which will shift the timestamp to BoundedWindow.TIMESTAMP_MAX_VALUE, once you know you reached head of the state topic. That would probably require read

Using Beam to generate unique ids with unbounded sources

2021-07-22 Thread Cristian Constantinescu
Hi All, I would like to know if there's a suggested pattern for the below scenario. TL;DR: reading state from Kafka. I have a scenario where I'm listening to a kafka topic and generate a unique id based on the properties of the incoming item. Then, I output the result to another kafka topic. The