1. The date is dynamic. (I.e if the date is changed we shouldn't read all records). Look like below solution will read all the records if the date is changed. (Please Correct me if I am wrong)
2. We can assume file is sorted by date. Sent from my iPhone On Sep 16, 2013, at 5:27 PM, Horia <[email protected]> wrote: Without sorting, you can implement this using the 'filter' transformation. This will eventually read all the rows once, but subsequently only shuffle and send the transformed data which passed the filter. Does this help, or did I misunderstand? On Sep 16, 2013 1:37 PM, "satheessh chinnu" <[email protected]> wrote: > i am having a text file. Each line is a record and first ten characters on > each line is a date in YYYY-MM-DD format. > > i would like to run a map function on this RDD with specific date range. (i.e > from 2005 -01-01 to 2007-12-31). I would like to avoid reading the records > out of the specified data range. (i.e kind of primary index sorted by date) > > is there way to implement this?
