Re:Re: Re: RE: Why a sql only use one map task?

2011-08-25 Thread Daniel,Wu
job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA At 2011-08-24 18:19:38,wd w...@wdicc.com wrote: What about your total Map Task Capacity? you may check it from http://your_jobtracker:50030/jobtracker.jsp 2011/8/24 Daniel,Wu hadoop...@163.com: I checked my

Re:Re:Re: Re: RE: Why a sql only use one map task?

2011-08-25 Thread Daniel,Wu
after I set set mapred.min.split.size=2; Then it will kick off 3 map tasks (the file I have is 500M). So looks like we need to set mapred.min.split.size instead of mapred.map.tasks to control how many maps to kick off. At 2011-08-25 19:38:30,Daniel,Wu hadoop...@163.com wrote

Re:Re: RE: Why a sql only use one map task?

2011-08-24 Thread Daniel,Wu
job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA At 2011-08-24 18:19:38,wd w...@wdicc.com wrote: What about your total Map Task Capacity? you may check it from http://your_jobtracker:50030/jobtracker.jsp 2011/8/24 Daniel,Wu hadoop...@163

Why a sql only use one map task?

2011-08-23 Thread Daniel,Wu
I run the following simple sql select count(*) from sales; And the job information shows it only uses one map task. The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set

Re:Re: Re: multiple tables join with only one hug table.

2011-08-14 Thread Daniel,Wu
Blog for answers to commonly asked questions. From: Daniel,Wu hadoop...@163.com To: hive user@hive.apache.org Sent: Thursday, August 11, 2011 7:01 PM Subject: multiple tables join with only one hug table. if the retailer fact table is sale_fact with 10B rows, and join with 3 small tables

failed when create an index with partitioned by clause

2011-08-14 Thread Daniel,Wu
create table part (a int,b int) PARTITIONED by (c int); create index part_idx on table part(b,c) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD partitioned by (a) ; hive create index part_idx on table part(b,c) AS

wants to create a JIRA (request): multiple tables join with only one hug table.

2011-08-14 Thread Daniel,Wu
Hi everyone, I'd like to create a change request (or JIRA, not sure), do you think it's feasible? And I search the document about how to contribute, but can't find a way about how to create a request, could anyone point me to the document? At 2011-08-14 17:08:26,Daniel,Wu hadoop...@163.com

Re:Re: multiple tables join with only one hug table.

2011-08-13 Thread Daniel,Wu
optimize by loading the smaller tables specified in the Mapjoin hint into memory. Then every small table is in memory of each mapper. -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: Daniel,Wu hadoop...@163.com To: hive user@hive.apache.org Sent

multiple tables join with only one hug table.

2011-08-11 Thread Daniel,Wu
if the retailer fact table is sale_fact with 10B rows, and join with 3 small tables: stores (10K), products(10K), period (1K). What's the best join solution? In oracle, it can first build hash for stores, and hash for products, and hash for stores. Then probe using the fact table, if the row

Fw:why hive has such a high latency?

2011-08-10 Thread Daniel,Wu
Anyone know why hive has such a high latency? scan a table with 16,522,439 rows take more than 85 seconds. To read these data off disk, we only need about 10 seconds (even not consider the caching which read data from memory). So where does 75 seconds go to? will Deserialize Serialize

what's the benifit of integrate hbase with hive? For low latency?

2011-08-08 Thread Daniel,Wu
Hive document said hive is high latency, to query a table with about 100M might take 1 minute. And hbase is a high performance database, so does that mean after integrate hive and hbase, hive will get a better performance with lower latency?