Re: H5 potential intermediate solution

2018-04-02 Thread Chamikara Jayalath
On Mon, Apr 2, 2018 at 5:19 PM Eila Arich-Landkof wrote: > Hi Cham, > > Thanks. I have created a PCollection from the dataset that is available in > the H5 file which is provided as numpy array. > It is very challenging for my use case to describe the schema. The > original dimensions of the data

Re: H5 potential intermediate solution

2018-04-02 Thread Eila Arich-Landkof
Hi Cham, Thanks. I have created a PCollection from the dataset that is available in the H5 file which is provided as numpy array. It is very challenging for my use case to describe the schema. The original dimensions of the dataset are 70K x 30K . Any suggestion how to work around that? I thi

Re: H5 potential intermediate solution

2018-04-02 Thread Chamikara Jayalath
(moving dev to bcc) Hi Eila, On Mon, Apr 2, 2018 at 3:50 PM OrielResearch Eila Arich-Landkof < e...@orielresearch.org> wrote: > Hi All, > > I was able to make it work by creating the PCollection with the numpy > array. However, writing to BQ was impossible because it requested for the > schema.

Re: H5 potential intermediate solution

2018-04-02 Thread OrielResearch Eila Arich-Landkof
Hi All, I was able to make it work by creating the PCollection with the numpy array. However, writing to BQ was impossible because it requested for the schema. The code: (p | "create all" >> beam.Create(expression[1:5,1:5]) | "write all text" >> beam.io.WriteToText('gs://archs4/output/', file_n

Re: BigQuery streaming insert errors

2018-04-02 Thread Carlos Alonso
And... where could I catch that exception? Thanks! On Mon, 2 Apr 2018 at 16:58, Ted Yu wrote: > Wouldn't the following code give you information about failed insertions > (around line 790 in BigQueryServicesImpl) ? > > if (!allErrors.isEmpty()) { > throw new IOException("Insert fai

H5 potential intermediate solution

2018-04-02 Thread OrielResearch Eila Arich-Landkof
Hello all, I would like to try a different way to leverage Apache beam for H5 => BQ (file to table transfer). For my use case, I would like to read every 10K rows of H5 data (numpy array format), transpose them and write them to BQ 10K columns. 10K is BQ columns limit. My code is below and fires

Re: BigQuery streaming insert errors

2018-04-02 Thread Ted Yu
Wouldn't the following code give you information about failed insertions (around line 790 in BigQueryServicesImpl) ? if (!allErrors.isEmpty()) { throw new IOException("Insert failed: " + allErrors); Cheers On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso wrote: > Hi everyone!! > > I

BigQuery streaming insert errors

2018-04-02 Thread Carlos Alonso
Hi everyone!! I was wondering if there's any way to get the error why an insert (streaming) failed. Looking at the code I think there's currently no way to do that, as the BigQueryServicesImpl insertAll seems to discard the errors and just add the failed TableRow instances into the failedInserts l