Hi Team,

I have a dataset like the below one in .dat file:

13/07/2022abc
PWJ   PWJABC 513213217ABC GM20 05.0000 6/20/39
#01000count

Now I want to extract the header and tail records which I was able to do
it. Now, from the header, I need to extract the date and match it with the
current system date. Also, for the tail records, I need to match the number
of actual rows i.e 1 in my case with the values mentioned in the last row.
That is a kind of pattern matching so that I can find '1' in the last row
and say that the actual records and the value in the tail record matches
with each other.

How can I do this? Any links would be helpful. I think regex pattern
matching should help.

Also, I will be getting 3 formats for now i.e CSV, .DAT file and .TXT file.

So, as per me I could do validation for all these 3 file formats using
spark.read.text().rdd and performing intended operations on Rdds. Just the
validation part.

Therefore, wanted to understand is there any better way to achieve this?

Thanks,
Sid

Reply via email to