Hello, I have a large CSV file in which the continued records(with same RecordID) may have the context meaning. I should see these continued records as ONE complete record. Also the recordID will be reset to 1 at some time when the csv dumper system think it's necessary.
I'd like to get some suggestion about how to do analyze with this kind of file in Spark ? for example, I need to get the number of the complete record which should consists >=2 continued records. Obviously, "2, s2, 9, r1, 7, r2, 8, r3, 3" is one of my target. A example sample of csv RecordID,stdID,stdVal,refID,refVal 1,s1,10,r1,7 2,s2,9,r1,7 2,s2,9,r2,8 2,s2,9,r3,3 3,s1,12,r2,10 ... 42,s3,8,r7,5 1,s2,11,r3,5 Best regards JiaQiang