After colse the JDBC connection, the hive don't delete hive exec scratch dir

2011-01-24 Thread lei liu
I use hive 0.6 version, I use JDBC to execute one search sql, for example below code: public static void query() { Connection hiveConnection = null; Statement statement = null; try { Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver"); hiveConne

Re: Can compression be used with ColumnarSerDe ?

2011-01-24 Thread yongqiang he
4M is a good number based on a lot of experiments. Increase the number will reduce the file size, but the saving will increase very slow after the block size goes beyond 4M, but the perf/memory will increase a lot. FYI, we are trying a new columnar file format. And it seems 4M is no longer the mag

Re: Can compression be used with ColumnarSerDe ?

2011-01-24 Thread Edward Capriolo
On Mon, Jan 24, 2011 at 7:51 PM, yongqiang he wrote: > Yes. It only support block compression. (No record level compression support.) > You can use the config 'hive.io.rcfile.record.buffer.size' to specify > the block size (before compression). The default is 4MB. > > Thanks > Yongqiang > On Mon,

Re: Can compression be used with ColumnarSerDe ?

2011-01-24 Thread yongqiang he
Yes. It only support block compression. (No record level compression support.) You can use the config 'hive.io.rcfile.record.buffer.size' to specify the block size (before compression). The default is 4MB. Thanks Yongqiang On Mon, Jan 24, 2011 at 4:44 PM, Edward Capriolo wrote: > On Mon, Jan 24,

Re: Can compression be used with ColumnarSerDe ?

2011-01-24 Thread Edward Capriolo
On Mon, Jan 24, 2011 at 4:42 PM, Edward Capriolo wrote: > On Mon, Jan 24, 2011 at 4:14 PM, yongqiang he > wrote: >> How did you upload the data to the new table? >> You can get the data compressed by doing a insert overwrite to the >> destination table with setting "hive.exec.compress.output" to

Re: Filtering out files in a bucket (update on HIVE-951)

2011-01-24 Thread Avram Aelony
hmmm, I've seen mention of SymLink but I don't yet grasp how it works/applies to selecting files to process. Also, I don't have much control over how the data gets to the bucket I end up reading from, hence the need to powerfully select. Could you point me to some SymLink documentation or an

Re: Is there a reason why this simple query would take a very long time?

2011-01-24 Thread Namit Jain
Although there is only 1 reducer, the amount of data to that reducer should be really small: it will have same number of rows as the number of mappers. Can you check how much data is your reducer getting ? Is it reading a long time to read the small data from each mapper ? Thanks, -namit On 1

Re: Filtering out files in a bucket (update on HIVE-951)

2011-01-24 Thread Edward Capriolo
On Mon, Jan 24, 2011 at 5:58 PM, Avram Aelony wrote: > Hi, > > I really like the virtual column feature in 0.7 that allows me to request > INPUT__FILE__NAME and see the names of files that are being acted on. > > Because I can see the files that are being read, I see that I am spending > time qu

Filtering out files in a bucket (update on HIVE-951)

2011-01-24 Thread Avram Aelony
Hi, I really like the virtual column feature in 0.7 that allows me to request INPUT__FILE__NAME and see the names of files that are being acted on. Because I can see the files that are being read, I see that I am spending time querying many, many very large files, most of which I do not need

Re: Is there a reason why this simple query would take a very long time?

2011-01-24 Thread Ajo Fod
So, in pig this is reduced by 4 threads automatially? That would be interesting. I've usually used groups, but when there is only one group, you still have this problem where the CPU resources are underutilized. -Ajo. On Mon, Jan 24, 2011 at 2:01 PM, Jonathan Coveney wrote: > Yes, I tried that,

Re: Is there a reason why this simple query would take a very long time?

2011-01-24 Thread Jonathan Coveney
Yes, I tried that, it looks like it forces it to 1 if there are no groups. 2011/1/24 Ajo Fod > oh ... sorry you say you already tried that. > > > > On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote: > > you could try to set the number of reducers e.g: > > set mapred.reduce.tasks=4; > > > > set th

Re: Is there a reason why this simple query would take a very long time?

2011-01-24 Thread Ajo Fod
oh ... sorry you say you already tried that. On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote: > you could try to set the number of reducers e.g: > set mapred.reduce.tasks=4; > > set this before doing the select. > > -Ajo > > On Mon, Jan 24, 2011 at 1:13 PM, Jonathan Coveney wrote: >> I have a

Re: Is there a reason why this simple query would take a very long time?

2011-01-24 Thread Ajo Fod
you could try to set the number of reducers e.g: set mapred.reduce.tasks=4; set this before doing the select. -Ajo On Mon, Jan 24, 2011 at 1:13 PM, Jonathan Coveney wrote: > I have a 10 node server or so, and have been mainly using pig on it, but > would like to try out Hive. > I am running thi

Re: Can compression be used with ColumnarSerDe ?

2011-01-24 Thread Edward Capriolo
On Mon, Jan 24, 2011 at 4:14 PM, yongqiang he wrote: > How did you upload the data to the new table? > You can get the data compressed by doing a insert overwrite to the > destination table with setting "hive.exec.compress.output" to true. > > Thanks > Yongqiang > On Mon, Jan 24, 2011 at 12:30 PM,

Re: Can compression be used with ColumnarSerDe ?

2011-01-24 Thread yongqiang he
How did you upload the data to the new table? You can get the data compressed by doing a insert overwrite to the destination table with setting "hive.exec.compress.output" to true. Thanks Yongqiang On Mon, Jan 24, 2011 at 12:30 PM, Edward Capriolo wrote: > I am trying to explore some use case tha

Is there a reason why this simple query would take a very long time?

2011-01-24 Thread Jonathan Coveney
I have a 10 node server or so, and have been mainly using pig on it, but would like to try out Hive. I am running this query, which doesn't take too long in Pig, but is taking quite a long time in Hive. hive -e "select count(1) as ct from my_table where v1='02' and v2 = ;" > thecount One

Can compression be used with ColumnarSerDe ?

2011-01-24 Thread Edward Capriolo
I am trying to explore some use case that I believe are perfect for the columnarSerDe, tables with 100+ columns where only one or two are selected in a particular query. CREATE TABLE () ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" STORED AS RCFile ; My issue is m

Re: Lots of Failed/Killed Task Attempts while running successful hive queries

2011-01-24 Thread Shrijeet Paliwal
Speculative execution must be on for hadoop in your setup. With speculative execution 'ON' framework starts running tasks on different nodes in parallel. The one that finishes first contributes, the other one is killed. -Shrijeet On Mon, Jan 24, 2011 at 11:56 AM, Vijay wrote: > Hi, > > When I r

Lots of Failed/Killed Task Attempts while running successful hive queries

2011-01-24 Thread Vijay
Hi, When I run hive queries, everything finishes successfully but in the jobtracker UI, I see a lot of failed/killed task attempts. For example I have a simple query that finishes in 4 minutes. On the jobtracker UI here are the map and reduce stats: MAP - Num Tasks: 138, Complete: 138, Failed/Kil

Custom Serde

2011-01-24 Thread ankit bhatnagar
Hi All, Whats the best way to add custom serde jar. looks like add jar says added successfully and I am able to list that jar in hive interface However Create table fails to find the class. Thanks Ankit