RE: Hive 0.11.0 | Issue with ORC Tables

2013-09-20 Thread Savant, Keshav
Hi Nitin, Thanks for your reply, we were in an impression that the codec will be responsible for ORC format conversion also. However as per your reply it seems that a conversion from normal CSV to ORC is required before hive upload. We got some leads from following URLs https://cwiki.apache.or

Re: Hive 0.11.0 | Issue with ORC Tables

2013-09-20 Thread Nitin Pawar
Keshav, Owen has provided the solution already. Thats the easiest of the the lot and from the master who wrote ORC himself :) to put it in simple words what he has suggested is, create a staging table which will be based on default text data format. >From the staging data load data into a ORC fi

Re: Hive 0.11.0 | Issue with ORC Tables

2013-09-20 Thread John Omernik
Another advantage to the method described by Owen is your process of creating the ORC file is distributed. (Rather than precreating the ORC file off cluster and then moving into the cluster). This way, you just push your text files into the cluster, do the select statement and push into the ORC t

load data stored as sequencefiles

2013-09-20 Thread Artem Ervits
Hello all, I'm a bit lost with using Hive and SequenceFiles. I loaded data using Sqoop from a RDBMS and stored as sequencefile. I jarred the class generated by sqoop and added it to my create table script. Now I create a table in hive and specify "STORED AS SEQUENCEFILE", I also "ADD JAR SQOOP_

How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Hi,   I have a file which is delimted by a tab. Also, there are some fields in the file which has a tab /t character and a new line /n character in some fields.   Is there any way to load this file using Hive load command? Or do i have to use a Custom Map Reduce (custom) Input format with java ?

Re: How to load /t /n file to Hive

2013-09-20 Thread Nitin Pawar
If your data contains new line chars, its better you write a custom map reduce job and convert the data into a single line removing all unwanted chars in column separator as well just having single new line char per line On Sat, Sep 21, 2013 at 12:38 AM, Raj Hadoop wrote: > Please note that the

Re: How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Please note that there is an escape chacter in the fields where the /t and /n are present. From: Raj Hadoop To: Hive Sent: Friday, September 20, 2013 3:04 PM Subject: How to load /t /n file to Hive Hi, I have a file which is delimted by a tab. Also, there

Re: How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Hi Nitin,   Thanks for the reply. I have a huge file in unix.   As per the file definition, the file is a tab separated file of fields. But I am sure that within some field's I have some new line character.   How should I find a record? It is a huge file. Is there some command?   Thanks,  

Re: How to load /t /n file to Hive

2013-09-20 Thread Gabriel Eisbruch
Hi One way that we used to solve that problem it's to transform the data when you are creating/loading it, for example we've applied UrlEncode to each field on create time. Thanks, Gabo. 2013/9/20 Raj Hadoop > Hi Nitin, > > Thanks for the reply. I have a huge file in unix. > > As per the file

Loading data into partition taking seven times total of (map+reduce) on highly skewed data

2013-09-20 Thread Stephen Boesch
We have a small (3GB /280M rows) table with 435 partitions that is highly skewed: one partition has nearly 200M, two others have nearly 40M apiece, then the remaining 432 have all together less than 1% of total table size. So .. the skew is something to be addressed. However - even give that - w

Re: Loading data into partition taking seven times total of (map+reduce) on highly skewed data

2013-09-20 Thread Stephen Boesch
Another detail: ~400 mappers 64 reducers 2013/9/20 Stephen Boesch > > We have a small (3GB /280M rows) table with 435 partitions that is highly > skewed: one partition has nearly 200M, two others have nearly 40M apiece, > then the remaining 432 have all together less than 1% of total table

Re: How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Hi Gabo, Are you suggesting to use java.net.URLEncoder ? Can you be more specific ? I have lot of fields in the file which are not only URL related but some text fields which has new line characters. Thanks, Raj From: Gabriel Eisbruch To: "user@hive.apache.o

Re: How to load /t /n file to Hive

2013-09-20 Thread Gabriel Eisbruch
Hi Raj, UrlEncode It's a good way to encode data and be sure that you will encode all special chars (as example \n will be encoded to %0A) It's not necessary that the field be and url to encode these (you could use other encoder but, we had very good results with UrlEncoder, ever the best way reco