job_201108242119_0006NORMALoracleselect period_key,count(*)
from...period_key(Stage-1)100.00%
11100.00%
3 3NANA
At 2011-08-24 18:19:38,wd w...@wdicc.com wrote:
What about your total Map Task Capacity?
you may check it from http://your_jobtracker:50030/jobtracker.jsp
2011/8/24 Daniel,Wu hadoop...@163.com:
I checked my
after I set
set mapred.min.split.size=2;
Then it will kick off 3 map tasks (the file I have is 500M). So looks like we
need to set mapred.min.split.size instead of mapred.map.tasks to control how
many maps to kick off.
At 2011-08-25 19:38:30,Daniel,Wu hadoop...@163.com wrote
job_201108242119_0006NORMALoracleselect period_key,count(*)
from...period_key(Stage-1)100.00%
11100.00%
3 3NANA
At 2011-08-24 18:19:38,wd w...@wdicc.com wrote:
What about your total Map Task Capacity?
you may check it from http://your_jobtracker:50030/jobtracker.jsp
2011/8/24 Daniel,Wu hadoop...@163
I run the following simple sql
select count(*) from sales;
And the job information shows it only uses one map task.
The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3
map tasks, one on each task nodes. What can make hive only run one map task? Do
I need to set
Blog for answers to commonly asked questions.
From: Daniel,Wu hadoop...@163.com
To: hive user@hive.apache.org
Sent: Thursday, August 11, 2011 7:01 PM
Subject: multiple tables join with only one hug table.
if the retailer fact table is sale_fact with 10B rows, and join with 3 small
tables
create table part (a int,b int) PARTITIONED by (c int);
create index part_idx on table part(b,c) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED
REBUILD
partitioned by (a) ;
hive create index part_idx on table part(b,c) AS
Hi everyone,
I'd like to create a change request (or JIRA, not sure), do you think it's
feasible? And I search the document about how to contribute, but can't find a
way about how to create a request, could anyone point me to the document?
At 2011-08-14 17:08:26,Daniel,Wu hadoop...@163.com
optimize by loading the smaller tables specified
in the Mapjoin hint into memory. Then every small table is in memory of each
mapper.
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.
From: Daniel,Wu hadoop...@163.com
To: hive user@hive.apache.org
Sent
if the retailer fact table is sale_fact with 10B rows, and join with 3 small
tables: stores (10K), products(10K), period (1K). What's the best join solution?
In oracle, it can first build hash for stores, and hash for products, and hash
for stores. Then probe using the fact table, if the row
Anyone know why hive has such a high latency? scan a table with 16,522,439
rows take more than 85 seconds. To read these data off disk, we only need about
10 seconds (even not consider the caching which read data from memory). So
where does 75 seconds go to? will Deserialize Serialize
Hive document said hive is high latency, to query a table with about 100M
might take 1 minute. And hbase is a high performance database, so does that
mean after integrate hive and hbase, hive will get a better performance with
lower latency?
11 matches
Mail list logo