Hello Brian.  Thank you for the clarification request.  I meant the first
case.  I have files that define field names and types.

On Fri, Jun 18, 2021 at 12:12 PM Brian Hulette <[email protected]> wrote:

> Could you clarify what you mean? I could interpret this two different ways:
> 1) Have a separate file that defines the literal schema (field names and
> types).
> 2) Infer a schema from data stored in some file in a structurerd format
> (e.g csv or parquet).
>
> For (1) Reuven's suggestion would work. You could also use an Avro avsc
> file here, which we also support.
> For (2) we don't have anything like this in the Java SDK. In the Python
> SDK the DataFrame API can do this though. When you use one of the pandas
> sources with the Beam DataFrame API [1] we peek at the file and infer the
> schema so you don't need to specify it. You'd just need to use
> to_pcollection to convert the dataframe to a schema-aware PCollection.
>
> Brian
>
> [1]
> https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.io.html
> [2]
> https://beam.apache.org/releases/pydoc/2.30.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_pcollection
>
> On Fri, Jun 18, 2021 at 7:50 AM Reuven Lax <[email protected]> wrote:
>
>> There is a proto format for Beam schemas. You could define it as a proto
>> in a file and then parse it.
>>
>> On Fri, Jun 18, 2021 at 7:28 AM Matthew Ouyang <[email protected]>
>> wrote:
>>
>>> I was wondering if there were any tools that would allow me to build a
>>> Beam schema from a file?  I looked for it in the SDK but I couldn't find
>>> anything that could do it.
>>>
>>

Reply via email to