Hi all,

I am looking to unzip a large gz file. Can I restrict the job to 1 worker
on dataflow runner and count on the order of the lines to stay as in the
original gz file? If not, what will be the easiest way to unzip the file.

p =  beam.Pipeline(options=options)
(p | 'Step 1.4.1 read gz file ' >> beam.io.ReadFromText(zip_file_name)
   | 'step 1.4.2 write file' >> beam.io.WriteToText(unaip_file_name))

Thank you,
--
Eila Arich

Founder, CEO

Oriel Research Therapeutics

https://www.orielresearch.com/blog
e...@orielresearch.com
www.orielresearch.com
Newton, MA
[image: twitter] <https://twitter.com/eilalan1>
[image: linkedin] <https://www.linkedin.com/in/eilalandkof/>
[image: instagram] <https://www.instagram.com/eilalan/>

Reply via email to