[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-09-08 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1240290474 @xiarixiaoyao thanks for the input, it works with these options ` .option("hoodie.schema.on.read.enable", "true") .option("hoodie.datasource.write.reconcile.schema",

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-09-07 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1239394929 @codope here is the output without the above mentioned config, have also added the code which am using for testing the fix. --ERROR `22/09/07

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-09-06 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1238356005 @nsivabalan @xiarixiaoyao I tested this fix locally, checked out the latest master branch and built the code using the command `mvn clean package -DskipTests -Dspark3.2 -Dscala-2.12`

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-08-29 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1229823108 Thanks for fixing this, we can close the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-06-21 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1161712629 @xiarixiaoyao as mentioned earlier we didn't solve the nested column case, we are currently trying to finalize a fixed schema and while reading in the data with spark use this schema

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-06-19 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1159671087 @kazdy thanks for the followup, we had solved this issue at the root level of the schema by the code provided by @xiarixiaoyao. If you check the code (on the top of the post) it

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-05-22 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1134144289 thanks @xiarixiaoyao, our schema for storing data as defined by FHIR standards https://www.hl7.org/fhir/patient.schema.json.html seams to be complicated, as most of the fields here

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-05-04 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1117269863 @xiarixiaoyao FYI, the createNewDF code throws the following error `Caused by: java.lang.RuntimeException: java.lang.String is not a valid external type for schema of array

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-05-04 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1117242801 @xiarixiaoyao @yihua we currently don't see this issue when we use the following configuration option --conf 'spark.hadoop.parquet.avro.write-old-list-structure=false'. Should be

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-05-03 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1115996169 @xiarixiaoyao FYI, we just tested this issue by building the release branch 0.11.0, here is the JSON string and the schema. `{ "resourceType": "Patient", "id":

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-04-29 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1113405600 @xiarixiaoyao thanks for helping out, let me know if you need any more information. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-04-29 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1112964120 We further took out the coding type from our JSON string one after the other the update worked for 2 elements (identifier and maritalstatus), it the coding type in the element

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-04-28 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1112883931 @xiarixiaoyao We did another test, we used this JSON string

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-04-28 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1112864560 Hi @xiarixiaoyao, thanks for the code. It worked like a charm for the reduced json as provided above. After successfully testing it with the reduced schema, we used the complete

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-04-27 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-715248 @xiarixiaoyao we are not concerned about the position as long as its there in the schema (either as the last column or somewhere else) along with all the existing columns. -- This

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-04-27 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-697151 Thanks @yihua, here are the detailed spark shell commands we used `./spark-shell --jars