I would agree with Eugene. A simple application that does this is probably what your looking for.
There are ways to make this work with parallel processing systems but its quite a hassle and only worthwhile if your computation is very expensive and want the additional computational power of multiple CPU cores. For example, in a parallel processing you could read the records from the file and remember the file offset / line number of each record. You could then group them under a single key and use the sorting extension to sort using the file offset / line number and then write out all the sorted records out to a single file. Note that this will likely be a lot slower then a simple program. On Tue, Aug 21, 2018 at 8:02 AM Eugene Kirpichov <[email protected]> wrote: > It sounds like you want to sequentially read a file, sequentially process > the records and sequentially write them. The best way to do this is likely > without using Beam, just write some Java or Python code using standard file > APIs (use Beam's FileSystem APIs if you need to access data on a non-local > filesystem). > > On Tue, Aug 21, 2018 at 7:11 AM [email protected] <[email protected]> > wrote: > >> Hi >> >> I have to process a big file and call several Pardo's to do some >> transformations. Records in file dont have any unique key. >> >> Lets say file 'testfile' has 1 million records. >> >> After processing , I want to generate only one output file same as my >> input 'testfile' and also i have a requirement to write those 1 million >> records in same order (after applying some Pardo's) >> >> What is best way to do it >> >> Thanks >> Aniruddh >> >> >> >>
