Yeah, I think that's his point :)
For a fine-grained facet, the hotspotting is desirable to co-locate the
data for query. To try to make an example to drive this point home:
Consider a primary key constraint(col1, col2, col3, col4);
If I defined the SALT_HASH based on "col1" alone, you'd get terrible
hotspotting. However, the contrast is when we have SALT_HASH on col1,
col2, col3, and col4, we have no row-oriented data locality (we have to
check *all* salt buckets for every query).
If you define the SALT_HASH on col1, col2, and col3, all values for col4
where col1-3 are fixed are co-located which would make faceted search
queries much faster (num SALT_BUCKET RPCs down to 1 RPC).
Concretely: if I'm on Amazon searching for "water bottle" "1L size"
"plastic composition" (col1, col2, and col3), it's really fast to give
me "manufacturer" (col4) given my other three constraints.
Hopefully I'm getting this right too. Tell me to shut up, Gerald, if I'm
not :)
On 9/14/18 1:01 AM, Thomas D'Silva wrote:
For the usage example that you provided when you write data how does the
values of id_1, id_2 and other_key vary?
I assume id_1 and id_2 remain the same while other_key is monotonically
increasing, and thats why the table is salted.
If you create the salt bucket only on id_2 then wouldn't you run into
region server hotspotting during writes?
On Thu, Sep 13, 2018 at 8:02 PM, Jaanai Zhang <cloud.pos...@gmail.com
<mailto:cloud.pos...@gmail.com>> wrote:
Sorry, I don't understander your purpose. According to your
proposal, it seems that can't achieve. You need a hash partition,
However, Some things need to clarify that HBase is a range
partition engine and the salt buckets were used to avoid hotspot, in
other words, HBase as a storage engine can't support hash partition.
----------------------------------------
Jaanai Zhang
Best regards!
Gerald Sangudi <gsang...@23andme.com <mailto:gsang...@23andme.com>>
于2018年9月13日周四 下午11:32写道:
Hi folks,
Any thoughts or feedback on this?
Thanks,
Gerald
On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi
<gsang...@23andme.com <mailto:gsang...@23andme.com>> wrote:
Hello folks,
We have a requirement for salting based on partial, rather
than full, rowkeys. My colleague Mike Polcari has identified
the requirement and proposed an approach.
I found an already-open JIRA ticket for the same issue:
https://issues.apache.org/jira/browse/PHOENIX-4757
<https://issues.apache.org/jira/browse/PHOENIX-4757>. I can
provide more details from the proposal.
The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N,
whereas Mike proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
The benefit at issue is that users gain more control over
partitioning, and this can be used to push some additional
aggregations and hash joins down to region servers.
I would appreciate any go-ahead / thoughts / guidance /
objections / feedback. I'd like to be sure that the concept
at least is not objectionable. We would like to work on this
and submit a patch down the road. I'll also add a note to
the JIRA ticket.
Thanks,
Gerald