how to deal with continued records

Zhang Jiaqiang Wed, 10 Jun 2015 23:59:03 -0700

Hello,

I have a large CSV file in which the continued records(with same RecordID)
may have the context meaning. I should see these continued records as ONE
complete record. Also the recordID will be reset to 1 at some time when the
csv dumper system think it's necessary.


I'd like to get some suggestion about how to do analyze with this kind of
file in Spark ? for example,

I need to get the number of the complete record which should consists >=2
continued records. Obviously, "2, s2, 9, r1, 7, r2, 8, r3, 3" is one of my
target.

A example sample of csv

RecordID,stdID,stdVal,refID,refVal
1,s1,10,r1,7
2,s2,9,r1,7
2,s2,9,r2,8
2,s2,9,r3,3
3,s1,12,r2,10
...
42,s3,8,r7,5
1,s2,11,r3,5


Best regards
JiaQiang

how to deal with continued records

Reply via email to