Another critical variable to check is dfs.datanode.max.xcievers. The
default value is 256. You should bump that up to 4096 or higher.
-Vijay
On Thu, Mar 1, 2012 at 7:07 PM, Abhishek Parolkar wrote:
> Hi There!
> I have been doing an interesting experiment of building mac mini cluster
> (http:/
Sorry about the dealyed response, RK.
Here is what I think:
1) first of all why hive is not able to even submit the job? Is it taking for
ever to query the list pf partitions from the meta store? getting 43K recs
should not be big deal at all??
--> Hive is possibly taking a long time to figure
Hi Michael,
To add to Bejoy's answer:
As far as I understand, the hashing function used in bucketing is not
persistently stored, so as per your example (SELECT * from T where user_id =
X), given X, Hive couldn't figure out what bucket for table T should it be
looking in.
As Bejoy pointed out,
Michael
Unlike partitions, hive doesn't choose buckets by itself based on a
where clause in HQL. You need to specify the number of buckets that the query
should be executing on using TABLE SAMPLE clause .
SELECT * from T TABLESAMPLE (m OUT OF n BUCKETS ON id) where user_id = X;
https:/
Hi folks,
I have a table T bucketized on user_id and I am surprised to see that all the
buckets are read during the execution of the following query:
SELECT * from T where user_id = X
What should I do to make sure hive will account for the bucket structure to run
this query ?
Cheers,
Michael