Re: Always mysterious "Could not obtain block" error for large jobs [Hive 0.8.1/ Hadoop 1.0.0 on Mac Mini Cluster]

2012-03-02 Thread Vijay
Another critical variable to check is dfs.datanode.max.xcievers. The default value is 256. You should bump that up to 4096 or higher. -Vijay On Thu, Mar 1, 2012 at 7:07 PM, Abhishek Parolkar wrote: > Hi There! >   I have been doing an interesting experiment of building mac mini cluster > (http:/

Re: better partitioning strategy in hive

2012-03-02 Thread Mark Grover
Sorry about the dealyed response, RK. Here is what I think: 1) first of all why hive is not able to even submit the job? Is it taking for ever to query the list pf partitions from the meta store? getting 43K recs should not be big deal at all?? --> Hive is possibly taking a long time to figure

Re: Where clause and bucketized table

2012-03-02 Thread Mark Grover
Hi Michael, To add to Bejoy's answer: As far as I understand, the hashing function used in bucketing is not persistently stored, so as per your example (SELECT * from T where user_id = X), given X, Hive couldn't figure out what bucket for table T should it be looking in. As Bejoy pointed out,

Re: Where clause and bucketized table

2012-03-02 Thread Bejoy Ks
Michael        Unlike partitions, hive doesn't choose buckets by itself based on a where clause in HQL. You need to specify the number of buckets that the query should be executing on using TABLE SAMPLE clause .  SELECT * from T TABLESAMPLE (m OUT OF n BUCKETS ON id)  where user_id = X; https:/

Where clause and bucketized table

2012-03-02 Thread mdefoinplatel.ext
Hi folks, I have a table T bucketized on user_id and I am surprised to see that all the buckets are read during the execution of the following query: SELECT * from T where user_id = X What should I do to make sure hive will account for the bucket structure to run this query ? Cheers, Michael