Hi Eila,
I'm not sure if I understand the complexity of your problem.
If you do not have to perform any transformation on the data inside CSVs
and just need to load them to Bigquery, isn't it enough to use bqload with
schema autodetect ?
https://cloud.google.com/bigquery/docs/schema-detect
Best
Z
Thank you!
Probably around 50.
Best,
Eila
On Thu, Sep 27, 2018 at 1:23 AM Ankur Goenka wrote:
> Hi Eila,
>
> That seems reasonable to me.
>
> Here is a reference on writing to BQ
> https://github.com/apache/beam/blob/1ffba44f7459307f5a134b8f4ea47ddc5ca8affc/sdks/python/apache_beam/examples/comp
Hi Eila,
That seems reasonable to me.
Here is a reference on writing to BQ
https://github.com/apache/beam/blob/1ffba44f7459307f5a134b8f4ea47ddc5ca8affc/sdks/python/apache_beam/examples/complete/game/leader_board.py#L326
May I know how many distinct column are you expecting across all files?
On
Hi Ankur / users,
I would like to make sure that the suggested pipeline can work for my
needs.
So, additional clarification:
- The CSV files have few common and few different columns. Each CSV file
represent a sample measurements record.
- When the CSVs merged together, I expect to have one tabl
Hi Ankur,
Thank you. Trying this approach now. Will let you know if I have any issue
implementing it.
Best,
Eila
On Tue, Sep 25, 2018 at 7:19 PM Ankur Goenka wrote:
> Hi Eila,
>
> If I understand correctly, the objective is to read a large number of CSV
> files, each of which contains a single
Hi Eila,
If I understand correctly, the objective is to read a large number of CSV
files, each of which contains a single row with multiple columns.
Deduplicate the columns in the file and write them to BQ.
You are using pandas DF to deduplicate the columns for a small set of files
which might not
Hello,
I would like to write large number of CSV file to BQ where the headers from
all of them is aggregated to one common headers. any advice is very
appreciated.
The details are:
1. 2.5M CSV files
2. Each CSV file: header of 50-60 columns
2. Each CSV file: one data row
there are common columns