+ Flink Users
________________________________
From: Yun Tang <myas...@live.com>
Sent: Monday, January 28, 2019 19:46
To: Soheil Pourbafrani
Subject: Re: How to infer table schema from Avro file

Hi Soheil

You should provide your generated Avro record class as the type of 
AvroInputFormat not Avro's GenericRecord class. Take an example, if your 
generated record named 'Nation', the correct way to create input should be:


AvroInputFormat<Nation> test = new AvroInputFormat<>(
    new Path("PathToAvroFile)
    , Nation.class);

By doing this, Flink would recognize your input format as 'PojoType' not 
'GenericType' which only has one field. And the field of columns would be 
inferred automatically

Best
Yun Tang
________________________________
From: Soheil Pourbafrani <soheil.i...@gmail.com>
Sent: Monday, January 28, 2019 5:54
To: user
Subject: How to infer table schema from Avro file

Hi, I load an Avro file in a Flink Dataset:


AvroInputFormat<GenericRecord> test = new AvroInputFormat<GenericRecord>(
        new Path("PathToAvroFile)
        , GenericRecord.class);
DataSet<GenericRecord> DS = env.createInput(test);

usersDS.print();

and here are the results of printing DS:
{"N_NATIONKEY": 14, "N_NAME": "KENYA", "N_REGIONKEY": 0, "N_COMMENT": " pending 
excuses haggle furiously deposits. pending, express pinto beans wake fluffily 
past t"}
{"N_NATIONKEY": 15, "N_NAME": "MOROCCO", "N_REGIONKEY": 0, "N_COMMENT": "rns. 
blithely bold courts among the closely regular packages use furiously bold 
platelets?"}
{"N_NATIONKEY": 16, "N_NAME": "MOZAMBIQUE", "N_REGIONKEY": 0, "N_COMMENT": "s. 
ironic, unusual asymptotes wake blithely r"}
{"N_NATIONKEY": 17, "N_NAME": "PERU", "N_REGIONKEY": 1, "N_COMMENT": 
"platelets. blithely pending dependencies use fluffily across the even pinto 
beans. carefully silent accoun"}
{"N_NATIONKEY": 18, "N_NAME": "CHINA", "N_REGIONKEY": 2, "N_COMMENT": "c 
dependencies. furiously express notornis sleep slyly regular accounts. ideas 
sleep. depos"}
{"N_NATIONKEY": 19, "N_NAME": "ROMANIA", "N_REGIONKEY": 3, "N_COMMENT": "ular 
asymptotes are about the furious multipliers. express dependencies nag above 
the ironically ironic account"}
{"N_NATIONKEY": 20, "N_NAME": "SAUDI ARABIA", "N_REGIONKEY": 4, "N_COMMENT": 
"ts. silent requests haggle. closely express packages sleep across the 
blithely"}

Now I want to create a table from DS Dataset with the exactly the same schema 
of Avro file, I mean columns should be N_NATIONKEY, N_NAME, N_REGIONKEY, and 
N_COMMENT.

I know using the line:


tableEnv.registerDataSet("tbTest", usersDS, "field1, field2, ...");

I can create a table and set the columns, but I want the columns to be inferred 
automatically from data. Is it possible?
I tried

tableEnv.registerDataSet("tbTest", DS);

but it creates a table with the schema:
root
 |-- f0: GenericType<org.apache.avro.generic.GenericRecord>

Reply via email to