Hello, I have huge gzipped files that I need to drop the header row from before loading to a hive table.
Right now, my process is: 1. Gunzip the data (...takes forever) 2. Drop the first row using the Unix sed command 3. Re-zip the data with gzip -1 (...takes forever) 4. Create the Hive table (on the compressed file to store it efficiently) I am trying to find a way to speed up this process. Ideally, it would involve loading the data to Hive as a first step and then deleting the first row, to avoid the unzip/rezip steps. Any ideas would be appreciated! -Dan