Well said Mich, I had gone through from the same scenario in which we had done ETL out side the hive. Once the transformation is done then we loaded all data into hive warehouse. I think, that's the best practice, we should follow it.
Regards, Vikas Parashar On Tue, Jan 5, 2016 at 5:02 PM, Mich Talebzadeh <m...@peridale.co.uk> wrote: > In would be interesting to do ETL outside of Hive by getting Data from > Webpage to an intermediate file, pruning the empty rows and loading the > final CSV file into Hive destination table. > > > > I am pretty sure this clean up outside of Hive would be faster compared to > said thing in Hive > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > *Sybase ASE 15 Gold Medal Award 2008* > > A Winning Strategy: Running the most Critical Financial Data on ASE 15 > > > http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf > > Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE > 15", ISBN 978-0-9563693-0-7*. > > co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN > 978-0-9759693-0-4* > > *Publications due shortly:* > > *Complex Event Processing in Heterogeneous Environments*, ISBN: > 978-0-9563693-3-8 > > *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume > one out shortly > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Ltd, its subsidiaries nor their employees > accept any responsibility. > > > > *From:* Mich Talebzadeh [mailto:m...@peridale.co.uk] > *Sent:* 05 January 2016 08:55 > *To:* user@hive.apache.org > *Subject:* RE: Deleting empty rows from hive table through java > > > > Hi Sateesh, > > > > You can do the clean-up in Hive by creating a staging table in Hive, > feeding your CSV data there and then inserting data into main table where > COL1 is NOT NULL. > > > > Alternatively you can create your Hive table as transactional. Although I > would say the staging table is better as you will have a full record of > your CSV data at any time. > > > > You can of course do the pruning of data outside of Hive using a simple > shell script with sed and awk (if you are familiar with those tools). > > > > cat CSV_FILE | '|sed -e '/^$/d' > > > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > *Sybase ASE 15 Gold Medal Award 2008* > > A Winning Strategy: Running the most Critical Financial Data on ASE 15 > > > http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf > > Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE > 15", ISBN 978-0-9563693-0-7*. > > co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN > 978-0-9759693-0-4* > > *Publications due shortly:* > > *Complex Event Processing in Heterogeneous Environments*, ISBN: > 978-0-9563693-3-8 > > *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume > one out shortly > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Ltd, its subsidiaries nor their employees > accept any responsibility. > > > > *From:* Sateesh Karuturi [mailto:sateesh.karutu...@gmail.com > <sateesh.karutu...@gmail.com>] > *Sent:* 05 January 2016 06:59 > *To:* user@hive.apache.org > *Subject:* Deleting empty rows from hive table through java > > > > Hello... > > Anyone please help me how to delete empty rows from hive table through > java? > > Thanks in advance >