Hello Brian. Thank you for the clarification request. I meant the first case. I have files that define field names and types.
On Fri, Jun 18, 2021 at 12:12 PM Brian Hulette <[email protected]> wrote: > Could you clarify what you mean? I could interpret this two different ways: > 1) Have a separate file that defines the literal schema (field names and > types). > 2) Infer a schema from data stored in some file in a structurerd format > (e.g csv or parquet). > > For (1) Reuven's suggestion would work. You could also use an Avro avsc > file here, which we also support. > For (2) we don't have anything like this in the Java SDK. In the Python > SDK the DataFrame API can do this though. When you use one of the pandas > sources with the Beam DataFrame API [1] we peek at the file and infer the > schema so you don't need to specify it. You'd just need to use > to_pcollection to convert the dataframe to a schema-aware PCollection. > > Brian > > [1] > https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.io.html > [2] > https://beam.apache.org/releases/pydoc/2.30.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_pcollection > > On Fri, Jun 18, 2021 at 7:50 AM Reuven Lax <[email protected]> wrote: > >> There is a proto format for Beam schemas. You could define it as a proto >> in a file and then parse it. >> >> On Fri, Jun 18, 2021 at 7:28 AM Matthew Ouyang <[email protected]> >> wrote: >> >>> I was wondering if there were any tools that would allow me to build a >>> Beam schema from a file? I looked for it in the SDK but I couldn't find >>> anything that could do it. >>> >>
