Z is just an example. It could be anything. Basically, anything that's not
in schema should be filtered out.
On Tue, 4 Jul 2023, 13:27 Hill Liu, wrote:
> I think you can define schema with column z and filter out records with z
> is null.
>
> On Tue, Jul 4, 2023 at 3:24 PM Shashank Rao
>
I think you can define schema with column z and filter out records with z
is null.
On Tue, Jul 4, 2023 at 3:24 PM Shashank Rao wrote:
> Yes, drop malformed does filter out record4. However, record 5 is not.
>
> On Tue, 4 Jul 2023 at 07:41, Vikas Kumar wrote:
>
>> Have you tried dropmalformed
Yes, drop malformed does filter out record4. However, record 5 is not.
On Tue, 4 Jul 2023 at 07:41, Vikas Kumar wrote:
> Have you tried dropmalformed option ?
>
> On Mon, Jul 3, 2023, 1:34 PM Shashank Rao wrote:
>
>> Update: Got it working by using the *_corrupt_record *field for the
>> first
Have you tried dropmalformed option ?
On Mon, Jul 3, 2023, 1:34 PM Shashank Rao wrote:
> Update: Got it working by using the *_corrupt_record *field for the first
> case (record 4)
>
> schema = schema.add("_corrupt_record", DataTypes.StringType);
> Dataset ds =
Update: Got it working by using the *_corrupt_record *field for the first
case (record 4)
schema = schema.add("_corrupt_record", DataTypes.StringType);
Dataset ds = spark.read().schema(schema).option("mode",
"PERMISSIVE").json("path").collect();
ds =
Hi all,
I'm trying to read around 1,000,000 JSONL files present in S3 using Spark.
Once read, I need to write them to BigQuery.
I have a schema that may not be an exact match with all the records.
How can I filter records where there isn't an exact schema match:
Eg: if my records were:
{"x": 1,