Hi Timo, Thanks for the clarification. It reassuring to hear that my code does the right thing. I'll just ignore these messages for now.
Niels On Mon, 8 Jul 2019, 15:09 Timo Walther, <twal...@apache.org> wrote: > Hi Niels, > > the type handling evolved during the years and is a bit messed up > through the different layers. You are almost right with your last > assumption "Is the provided serialization via TypeInformation 'skipped' > during startup and only used during runtime?". The type extraction > returns a Kryo type and the Kryo type is using the configured default > serializers during runtime. Therefore, the log entry is just an INFO but > not a WARNING. And you did everything correct. > > Btw there is also the possiblity to insert a custom type into the type > extration by using Type Factories [0]. > > Maybe as a side comment: We are aware of these confusions and the Table > & SQL API will hopefully not use the TypeExtractor anymore in 1.10. This > is what I am working on at the moment. > > Regards, > Timo > > [0] > > https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.html#defining-type-information-using-a-factory > > Am 08.07.19 um 14:17 schrieb Niels Basjes: > > Hi, > > > > Context: > > I'm looking into making the Google (BigQuery compatible) HyperLogLog++ > > implementation available in Flink because it is simply an Apache > > licensed opensource library > > - https://issuetracker.google.com/issues/123269269 > > - https://issues.apache.org/jira/browse/BEAM-7013 > > - https://github.com/google/zetasketch > > > > While doing this I noticed that even though I provided an explicit > > Kryo Serializer for the core class > > > > i.e. I did > senv.getConfig().registerTypeWithKryoSerializer(HyperLogLogPlusPlus.class, > > HLLSerializer.class); > > > > I still see messages like this when registering a new > > UserDefinedFunction (AggregateFunction / ScalarFunction) that has this > > class as either input of output: > > > > 13:59:57,316 [INFO ] TypeExtractor : 1815: > > class com.google.zetasketch.HyperLogLogPlusPlus does not contain a > > getter for field allowedTypes > > 13:59:57,317 [INFO ] TypeExtractor : 1818: > > class com.google.zetasketch.HyperLogLogPlusPlus does not contain a > > setter for field allowedTypes > > 13:59:57,317 [INFO ] TypeExtractor : 1857: > > Class class com.google.zetasketch.HyperLogLogPlusPlus cannot be used > > as a POJO type because not all fields are valid POJO fields, and must > > be processed as GenericType. Please read the Flink documentation on > > "Data Types & Serialization" for details of the effect on performance. > > > > So it is complaining about the serialization performance when done in > > a different way than was configured. > > > > Then I noticed that I see similar messages in other situations too. > > > > In this code > > > https://github.com/nielsbasjes/yauaa/blob/master/udfs/flink-table/src/test/java/nl/basjes/parse/useragent/flink/table/DemonstrationOfTumblingTableSQLFunction.java#L165 > > > > I see > > 13:59:58,478 [INFO ] TypeExtractor : 1815: > > class org.apache.flink.types.Row does not contain a getter for field > > fields > > 13:59:58,478 [INFO ] TypeExtractor : 1818: > > class org.apache.flink.types.Row does not contain a setter for field > > fields > > 13:59:58,479 [INFO ] TypeExtractor : 1857: > > Class class org.apache.flink.types.Row cannot be used as a POJO type > > because not all fields are valid POJO fields, and must be processed as > > GenericType. Please read the Flink documentation on "Data Types & > > Serialization" for details of the effect on performance. > > > > even though a full TypeInformation instance for that type was provided > > > > TypeInformation<Row> tupleType = new RowTypeInfo(SQL_TIMESTAMP, > > STRING, STRING, STRING, STRING, LONG); > > DataStream<Row> resultSet = tableEnv.toAppendStream(resultTable, > tupleType); > > > > I checked with my debugger and the code IS using for both mentioned > > examples the correct serialization classes when running. > > > > So what is happening here? > > Did I forget to do a required call? > > So is this a bug? > > Is the provided serialization via TypeInformation 'skipped' during > > startup and only used during runtime? > > > >