[GitHub] [incubator-hudi] afilipchik commented on pull request #1514: [HUDI-774] Addressing incorrect Spark to Avro schema generation

GitBox Fri, 22 May 2020 11:22:34 -0700


afilipchik commented on pull request #1514:
URL: https://github.com/apache/incubator-hudi/pull/1514#issuecomment-632839321



   @umehrot2 yep, it is attempt to fix schema generated by spark-avro. Moving 
generation in house makes sense, but, if I recall correctly, the issue is not 
coming from spark itself but from underlying library they are using. So, it can 
be a bit of work to rewrite it. 
   
   On the test case -> incoming dataset is transformed using Spark Sql with 
schema derived from the query result (NullTargetConverter). Then we add new 
field to the output, write a batch and run a compaction. At this point new 
schema can't be used to read old data as it will fail on new non default 
fields. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-hudi] afilipchik commented on pull request #1514: [HUDI-774] Addressing incorrect Spark to Avro schema generation

Reply via email to