Hi Team, I have a dataset like the below one in .dat file:
13/07/2022abc PWJ PWJABC 513213217ABC GM20 05.0000 6/20/39 #01000count Now I want to extract the header and tail records which I was able to do it. Now, from the header, I need to extract the date and match it with the current system date. Also, for the tail records, I need to match the number of actual rows i.e 1 in my case with the values mentioned in the last row. That is a kind of pattern matching so that I can find '1' in the last row and say that the actual records and the value in the tail record matches with each other. How can I do this? Any links would be helpful. I think regex pattern matching should help. Also, I will be getting 3 formats for now i.e CSV, .DAT file and .TXT file. So, as per me I could do validation for all these 3 file formats using spark.read.text().rdd and performing intended operations on Rdds. Just the validation part. Therefore, wanted to understand is there any better way to achieve this? Thanks, Sid