Yeah, I think that's his point :)

For a fine-grained facet, the hotspotting is desirable to co-locate the data for query. To try to make an example to drive this point home:

Consider a primary key constraint(col1, col2, col3, col4);

If I defined the SALT_HASH based on "col1" alone, you'd get terrible hotspotting. However, the contrast is when we have SALT_HASH on col1, col2, col3, and col4, we have no row-oriented data locality (we have to check *all* salt buckets for every query).

If you define the SALT_HASH on col1, col2, and col3, all values for col4 where col1-3 are fixed are co-located which would make faceted search queries much faster (num SALT_BUCKET RPCs down to 1 RPC).

Concretely: if I'm on Amazon searching for "water bottle" "1L size" "plastic composition" (col1, col2, and col3), it's really fast to give me "manufacturer" (col4) given my other three constraints.

Hopefully I'm getting this right too. Tell me to shut up, Gerald, if I'm not :)

On 9/14/18 1:01 AM, Thomas D'Silva wrote:
For the usage example that you provided when you write data how does the values of id_1, id_2 and other_key vary? I assume id_1 and id_2 remain the same while other_key is monotonically increasing, and thats why the table is salted. If you create the salt bucket only on id_2 then wouldn't you run into region server hotspotting during writes?

On Thu, Sep 13, 2018 at 8:02 PM, Jaanai Zhang <cloud.pos...@gmail.com <mailto:cloud.pos...@gmail.com>> wrote:

    Sorry, I don't understander your purpose. According to your
    proposal, it seems that can't achieve.  You need a hash partition,
    However,  Some things need to clarify that HBase is a range
    partition engine and the salt buckets were used to avoid hotspot, in
    other words, HBase as a storage engine can't support hash partition.

    ----------------------------------------
        Jaanai Zhang
        Best regards!



    Gerald Sangudi <gsang...@23andme.com <mailto:gsang...@23andme.com>>
    于2018年9月13日周四 下午11:32写道:

        Hi folks,

        Any thoughts or feedback on this?

        Thanks,
        Gerald

        On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi
        <gsang...@23andme.com <mailto:gsang...@23andme.com>> wrote:

            Hello folks,

            We have a requirement for salting based on partial, rather
            than full, rowkeys. My colleague Mike Polcari has identified
            the requirement and proposed an approach.

            I found an already-open JIRA ticket for the same issue:
            https://issues.apache.org/jira/browse/PHOENIX-4757
            <https://issues.apache.org/jira/browse/PHOENIX-4757>. I can
            provide more details from the proposal.

            The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N,
            whereas Mike proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .

            The benefit at issue is that users gain more control over
            partitioning, and this can be used to push some additional
            aggregations and hash joins down to region servers.

            I would appreciate any go-ahead / thoughts / guidance /
            objections / feedback. I'd like to be sure that the concept
            at least is not objectionable. We would like to work on this
            and submit a patch down the road. I'll also add a note to
            the JIRA ticket.

            Thanks,
            Gerald



Reply via email to