Thank you so much Talebzadeh.
I have filed an issue . https://issues.apache.org/jira/browse/HIVE-13766
And I changed clustering column to int, based on string column hash.
something like
String actionClassifier = ""my_action,cl_900";
MessageDigest md = MessageDigest.getInstance(MD_5);
byte
Hi,
OK I tried your table.
0: jdbc:hive2://rhes564:10010/default> describe formatted bucket_x;
OK
+---+---+-+--+
| col_name|
data_type |
OK Talebzadeh thanks.
Have you ever tried tablesample with string value hash?
something like 'tablesample( bucket 1 out of 256 on
some_field='somedata')'
I wrote a full scenario.
# table creating
Time taken: 0.155 seconds, Fetched: 36 row(s)
hive> CREATE TABLE `bucket_x` (
> `classifi
This is your code
SELECT COUNT(*) FROM X
TABLESAMPLE(BUCKET 1 OUT OF 256 ON action_id, classifier)
WHERE action_id='aaa' AND classifier='bbb'
Well I have a table dummy with 1 billion rows imported from Oracle as ORC
format
hive> show create table dummy;
OK
CREATE TABLE `dummy`(
`id` int,
`c
ah, as i mentioned
both field type of action_id and classifier is STRING. and I can not change
the type.
CREATE TABLE `X`(`action_id` string,`classifier` string)
CLUSTERED BY (action_id,classifier) INTO 256 BUCKETS
STORED AS ORC
I use two fields for hash then bucketing because each one field is n
Is action_id can be created as a numeric column:
CREATE TABLE X ( action_id bigint, ..)
Bucketing or hash partitioning best works on numeric columns with high
cardinality (say a primary key).
>From my old notes:
Bucketing in Hive refers to hash partitioning where a hashing function is
applied.
Hello.
I want to ask the correct bucketing and tablesample way.
There is a table X which I created by
CREATE TABLE `X`(`action_id` string,`classifier` string)
CLUSTERED BY (action_id,classifier) INTO 256 BUCKETS
STORED AS ORC
Then I inserted 500M of rows into X by
set hive.enforce.bucketing=tr