Hi Team, I am getting the below error when reading a column with a value with JSON string.
json_schema_ctx_rdd = record_df.rdd.map(lambda row: row.contexts_parsed) spark.read.option("mode", "PERMISSIVE").option("inferSchema", "true").option("inferTimestamp", "false").json(json_schema_ctx_rdd) The contexts_parsed json string contains dynamic columns so not sure which timestamp column is bad. How to identify the bad record and resolve this issue? File "/usr/lib/spark/python/pyspark/worker.py", line 686, in main process() File "/usr/lib/spark/python/pyspark/worker.py", line 678, in process serializer.dump_stream(out_iter, outfile) File "/usr/lib/spark/python/pyspark/serializers.py", line 145, in dump_stream for obj in iterator: File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 288, in func for x in iterator: File "/usr/lib/spark/python/pyspark/serializers.py", line 151, in load_stream yield self._read_with_length(stream) File "/usr/lib/spark/python/pyspark/serializers.py", line 173, in _read_with_length return self.loads(obj) File "/usr/lib/spark/python/pyspark/serializers.py", line 452, in loads return pickle.loads(obj, encoding=encoding) File "/usr/lib/spark/python/pyspark/sql/types.py", line 1729, in <lambda> return lambda *a: dataType.fromInternal(a) File "/usr/lib/spark/python/pyspark/sql/types.py", line 823, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion) File "/usr/lib/spark/python/pyspark/sql/types.py", line 823, in <listcomp> for f, v, c in zip(self.fields, obj, self._needConversion) File "/usr/lib/spark/python/pyspark/sql/types.py", line 594, in fromInternal return self.dataType.fromInternal(obj) File "/usr/lib/spark/python/pyspark/sql/types.py", line 223, in fromInternal return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000) ValueError: year -1976 is out of range Appreciate any guidance. Cheers! Manoj.