Hi Team,

I am getting the below error when reading a column with a value with JSON
string.

json_schema_ctx_rdd = record_df.rdd.map(lambda row: row.contexts_parsed)
spark.read.option("mode", "PERMISSIVE").option("inferSchema",
"true").option("inferTimestamp", "false").json(json_schema_ctx_rdd)

The contexts_parsed json string contains dynamic columns so not sure
which timestamp column is bad. How to identify the bad record and resolve
this issue?


File "/usr/lib/spark/python/pyspark/worker.py", line 686, in main

    process()

  File "/usr/lib/spark/python/pyspark/worker.py", line 678, in process

    serializer.dump_stream(out_iter, outfile)

  File "/usr/lib/spark/python/pyspark/serializers.py", line 145, in
dump_stream

    for obj in iterator:

  File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 288, in func

    for x in iterator:

  File "/usr/lib/spark/python/pyspark/serializers.py", line 151, in
load_stream

    yield self._read_with_length(stream)

  File "/usr/lib/spark/python/pyspark/serializers.py", line 173, in
_read_with_length

    return self.loads(obj)

  File "/usr/lib/spark/python/pyspark/serializers.py", line 452, in loads

    return pickle.loads(obj, encoding=encoding)

  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1729, in <lambda>

    return lambda *a: dataType.fromInternal(a)

  File "/usr/lib/spark/python/pyspark/sql/types.py", line 823, in
fromInternal

    for f, v, c in zip(self.fields, obj, self._needConversion)

  File "/usr/lib/spark/python/pyspark/sql/types.py", line 823, in <listcomp>

    for f, v, c in zip(self.fields, obj, self._needConversion)

  File "/usr/lib/spark/python/pyspark/sql/types.py", line 594, in
fromInternal

    return self.dataType.fromInternal(obj)

  File "/usr/lib/spark/python/pyspark/sql/types.py", line 223, in
fromInternal

    return datetime.datetime.fromtimestamp(ts //
1000000).replace(microsecond=ts % 1000000)

ValueError: year -1976 is out of range



Appreciate any guidance.

Cheers!
Manoj.

Reply via email to