Hi -

I have a very unique problem which I am trying to solve and I am not sure
if spark would help here.

I have a directory: /X/Y/a.txt and in the same structure /X/Y/Z/b.txt.

a.txt contains a unique serial number, say:
12345

and b.txt contains key value pairs.
a,1
b,1,
c,0 etc.

Everyday you receive data for a system Y. so there are multiple a.txt and
b.txt for a serial number.  The serial number doesn't change and that the
key. So there are multiple systems and the data of a whole year is
available and its huge.

I am trying to generate a report of unique serial numbers where the value
of the option a has changed to 1 over the last few months. Lets say the
default is 0. Also figure how many times it was toggled.


I am not sure how to read two text files in spark at the same time and
associated them with the serial number. Is there a way of doing this in
place given that we know the directory structure ? OR we should be
transforming the data anyway to solve this ?

Reply via email to