Hi, We have data in Orc formatted table, we filter certain records and then create an Avro format hive table using the "insert into" clause.
Our use case is to create smaller avro data files in a hive table that can be passed on to consumers as a Kafka Message. Can we restrict the file size in an avro backed hive table while we execute the insert into command. One solution we had was to use clustered by, but since the number of records/size is not known beforehand it becomes difficult to create the number of buckets. Anything else we can try to restrict this?
