@Reuven Lax: SchemaTranslation.schemaFromProto appears to cover all of the
Beam Schema types including LogicalType.  Without this I would have tried
to build something around the JSON format I was using currently, but now I
will talk to my team about switching to Proto.

@Christian Battista: Regarding how BigQuery arrays can't be null, it makes
total sense.  I was using the BigQuery schema format along with
BigQueryUtils to build a Beam Schema, but the feature you mentioned means I
will need to switch to something different.

Thank you everyone for your feedback and pushing me to be clearer in my
request.  Feel free to continue the discussion if you want, but I feel I
got what I needed.

On Wed, Jun 23, 2021 at 9:33 AM Christian Battista <[email protected]>
wrote:

> Hi Matthew, just wanted to point out that in BQ arrays can't be null (this
> is probably why BigQueryUtils has the behaviour you observed).
>
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
>
> Best,
> -C
>
> On Tue, Jun 22, 2021 at 11:06 PM Matthew Ouyang <[email protected]>
> wrote:
>
>> I am currently using BigQueryUtils to convert a BigQuery TableSchema to a
>> Beam Schema but I am looking to either switch off that approach because I'm
>> looking for nullable arrays (BigQueryUtils always makes arrays not
>> nullable) and ability to add my own logical types (one of my fields was
>> unstructured JSON).
>>
>> I'm open to using proto or Avro since I would like to avoid the worst
>> case scenario of building my own.  However it doesn't look like either has
>> support to add logical types, and proto appears to be missing support for
>> the Beam Row type.
>>
>> On Fri, Jun 18, 2021 at 1:56 PM Brian Hulette <[email protected]>
>> wrote:
>>
>>> Are the files in some special format that you need to parse and
>>> understand? Or could you opt to store the schemas as proto descriptors or
>>> Avro avsc?
>>>
>>> On Fri, Jun 18, 2021 at 10:40 AM Matthew Ouyang <
>>> [email protected]> wrote:
>>>
>>>> Hello Brian.  Thank you for the clarification request.  I meant the
>>>> first case.  I have files that define field names and types.
>>>>
>>>> On Fri, Jun 18, 2021 at 12:12 PM Brian Hulette <[email protected]>
>>>> wrote:
>>>>
>>>>> Could you clarify what you mean? I could interpret this two different
>>>>> ways:
>>>>> 1) Have a separate file that defines the literal schema (field names
>>>>> and types).
>>>>> 2) Infer a schema from data stored in some file in a structurerd
>>>>> format (e.g csv or parquet).
>>>>>
>>>>> For (1) Reuven's suggestion would work. You could also use an Avro
>>>>> avsc file here, which we also support.
>>>>> For (2) we don't have anything like this in the Java SDK. In the
>>>>> Python SDK the DataFrame API can do this though. When you use one of the
>>>>> pandas sources with the Beam DataFrame API [1] we peek at the file and
>>>>> infer the schema so you don't need to specify it. You'd just need to use
>>>>> to_pcollection to convert the dataframe to a schema-aware PCollection.
>>>>>
>>>>> Brian
>>>>>
>>>>> [1]
>>>>> https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.io.html
>>>>> [2]
>>>>> https://beam.apache.org/releases/pydoc/2.30.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_pcollection
>>>>>
>>>>> On Fri, Jun 18, 2021 at 7:50 AM Reuven Lax <[email protected]> wrote:
>>>>>
>>>>>> There is a proto format for Beam schemas. You could define it as a
>>>>>> proto in a file and then parse it.
>>>>>>
>>>>>> On Fri, Jun 18, 2021 at 7:28 AM Matthew Ouyang <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> I was wondering if there were any tools that would allow me to build
>>>>>>> a Beam schema from a file?  I looked for it in the SDK but I couldn't 
>>>>>>> find
>>>>>>> anything that could do it.
>>>>>>>
>>>>>>
>
> --
> Christian Battista, Ph.D.
> he/him
> Senior Data Engineer
> *BenchSci*
> *www.benchsci.com <http://www.benchsci.com>*
> *E: *[email protected]
> Did you know that $48 billion is lost to Avoidable Experiment Expenditure
> every year? Read our latest whitepaper <https://hubs.ly/H0rB8W50> to
> learn more.
>

Reply via email to