Re: Optimisation on join in case of all the data to be joined present in the same machine (region server)

Josh Elser Mon, 16 Apr 2018 10:46:36 -0700

Please keep communication on the mailing list.

Remember that you can execute partial-row upserts with Phoenix. As longas you can generate the primary key from each stream, you don't need todo anything special in Kafka streams. You can just submit 5 UPSERTS (onefor each stream), and the Phoenix table will eventually have theaggregated row when you are finished.


On 4/16/18 1:30 PM, Rabin Banerjee wrote:

Actually I haven't finalised anything just looking at different options.
Basically if I want to join 5 streams and I want to create adenormalized stream. Now the problem is if Stream 1's output for currentwindow is key 1,2,3,4,5. and might happen that all the other keys havealready emitted that key before, I can not join them with Kafkastreams.I need to maintain the whole state for all the streams. So Ineed to figure out the key 1,2,3,4,5 from all the stream and generate acombined one as realtime as possible.
On Mon, Apr 16, 2018 at 9:04 PM, Josh Elser <els...@apache.org<mailto:els...@apache.org>> wrote:
    Short-answer: no.

    You're going to be much better off de-normalizing your five tables
    into one table and eliminate the need for this JOIN.

    What made you decide to want to use Phoenix in the first place?


    On 4/16/18 6:04 AM, Rabin Banerjee wrote:

        HI all,

        I am new to phoenix, I wanted to know if I have to join 5 huge
        tables where all are keyed based on the same id (i.e. one id
        columns is common between all of them), is there any
        optimization to add to make this join faster , as all the data
        for a particular key for all 5 tables will reside in the same
        region server .

        To explain it bit more, suppose we have 5 streams all having a
        common id that we can join with are getting stored in 5
        different hbase table. And we want to join them with Phoenix but
        we dont want cross region shuffle as we already know that the
        key is common in all 5 tables.


        Thanks //

Re: Optimisation on join in case of all the data to be joined present in the same machine (region server)

Reply via email to