Hi, I wrote a shell script to get csv data but when i run that script on a 12GB csv its taking more time. If i run a python script will that be faster?
On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck <[email protected]>wrote: > How about a Python script that changes it into plain tab-separated text? > So it would look like this…**** > > ** ** > > 174969274<tab>14-mar-2006<tab>3522876<tab> > <tab>14-mar-2006<tab>500000308<tab>65<tab>1<newline> > etc…**** > > ** ** > > Tab-separated with newlines is easy to read and works perfectly on import. > **** > > ** ** > > Chuck Connell**** > > Nuance R&D Data Team**** > > Burlington, MA**** > > 781-565-4611**** > > ** ** > > *From:* Sandeep Reddy P [mailto:[email protected]] > *Subject:* How to load csv data into HIVE**** > > ** ** > > Hi, > Here is the sample data > "174969274","14-mar-2006","**** > > 3522876","","14-mar-2006","500000308","65","1"| > "174969275","19-jul-2006","3523154","","19-jul-2006","500000308","65","1"| > "174969276","31-dec-2005","3530333","","31-dec-2005","500000308","65","1"| > "174969277","14-apr-2005","3531470","","14-apr-2005","500000308","65","1"| > > How to load this kind of data into HIVE? > I'm using shell script to get rid of double quotes and '|' but its taking > very long time to work on each csv which are 12GB each. What is the best > way to do this?**** > > ** ** > -- Thanks, sandeep
