On Tue, Oct 15, 2013 at 11:12 AM, Michael Segel <[email protected]>wrote:
> Anil, > > Agree with you. But, as per my knowledge and experience with > coprocessors, > > they are meant to be used for operations that are local to RS. Otherwise, > > you are in danger of running into deadlocks, scalability issues. > > > I also did a quick look at… HBASE-7474… > > You start with the assumption that all of your data is within a single > region. > No, i dont. That sorting CP works even if your scan spans multiple RS's. I do a merge sort at client side in that case. Please look at the code closely. :) > > IMHO, this is a very narrow window for use cases. > > Most use cases have data that crosses region boundaries. > > From a design perspective… limiting the use case to only within region… > kinda kills the reason for coprocessors to exist. Even looking back at the > implementation by Google, they don't appear to have this problem… errr > limitation. > > Sorry… IMHO and YMMV. > > > On Oct 14, 2013, at 3:25 PM, anil gupta <[email protected]> wrote: > > > Inline. > > > > > > On Mon, Oct 14, 2013 at 7:50 AM, Michael Segel < > [email protected]>wrote: > > > >> Anil, > >> > >> I wasn't suggesting that you can't do what you're doing, but you end up > >> running in to the risks which coprocessors are supposed to remove. The > >> standard YMMV always applies. > >> > > Agree with you. But, as per my knowledge and experience with > coprocessors, > > they are meant to be used for operations that are local to RS. Otherwise, > > you are in danger of running into deadlocks, scalability issues. > > > >> > >> You have a cluster… another team in your company wants to use the > cluster. > >> So instead of the cluster being a single resource for your app/team, it > now > >> becomes a shared resource. So now you have people accessing HBase for > >> multiple apps. > >> > > Well, its a separation of responsibility in this case. We don't want > teams > > to step each other toes and at the same time work well as an ecosystem. > > Rule: Other teams can use same cluster. But they cannot write directly > into > > the tables that we own/control. If they want to write into our tables > then > > they have to use our HBase Client. > > > >> > >> You could then run multiple HBase HMasters with different locations for > >> files, however… this can get messy. > >> HOYA seems to suggest this as the future. If so, then you have to > wonder > >> about data locality. > >> > > HOYA is not even in beta at present. So, right now we are not thinking > > about it. > > > >> > >> Having your app update the primary table and then the secondary index is > >> always a good fallback, however you need to ensure that you understand > the > >> risks. > >> > > Agree, i understand that there is risk. But, you have to bite the bullet > > when you are doing something that is not supported out of the box. We > also > > use CP's wherever they are appropriate(like HBASE-7474). > > > >> > >> With respect to secondary indexes… if you decouple the writes… you can > get > >> better throughput. Note that the code becomes a bit more complex because > >> you're going to have to introduce a couple of different things. But > thats > >> something for a different discussion… > >> > > Whether to use CP or not, depends on the use case. In my opinion, CP's > are > > really powerful and an awesome feature in HBase. But, sometimes if not > used > > properly(like creating a Cyclic Graph as per Tom's example), they might > be > > problematic. > > > > > >> > >> On Oct 13, 2013, at 10:15 AM, anil gupta <[email protected]> wrote: > >> > >>> Inline. > >>> > >>> On Sun, Oct 13, 2013 at 6:02 AM, Michael Segel < > >> [email protected]>wrote: > >>> > >>>> Ok… > >>>> > >>>> Sure you can have your app update the secondary index table. > >>>> The only issue with that is if someone updates the base table outside > of > >>>> your app, > >>>> they may or may not increment the secondary index. > >>>> > >>> Anil: We dont allow people to write data into HBase from their own > HBase > >>> client. We control the writes into HBase. So, we dont have the problem > of > >>> secondary index not getting written. > >>> For example, If you expose a restful web service you can easily control > >> the > >>> writes to HBase. Even, if user requests to write one row in "main > table", > >>> you application can have the logic to writing in "Secondary index" > >> tables. > >>> In this way, it is transparent to users also. You can add/remove > seconday > >>> indexes as you want. > >>> > >>>> Note that your secondary index doesn't have to be an inverted table, > but > >>>> could be SOLR, LUCENE or something else. > >>>> > >>> Anil:As of now, we are happy with Inverted tables as they fit to our > use > >>> case. > >>> > >>>> > >>>> So you really want to secondary indexes on the server. > >>>> > >>>> There are a couple of things that could improve the performance, > >> although > >>>> the write to the secondary index would most likely lag under heavy > load. > >>>> > >>>> > >>>> On Oct 12, 2013, at 11:27 PM, anil gupta <[email protected]> > wrote: > >>>> > >>>>> John, > >>>>> > >>>>> My 2 cents: > >>>>> I tried implementing Secondary Index by using Region Observers on > Put. > >> It > >>>>> works well under low load. But, under heavy load the RO could not > keep > >> up > >>>>> with load cross region server writes. > >>>>> Then, i decided not to use RO as per Andrew's explanation and I > moved > >>>> all > >>>>> the logic of building secondary index tables on my HBase Client . > Since > >>>>> then, the system has been running fine under heavy load. > >>>>> IMO, if you will use RO and do cross RS read/write then perhaps this > >> will > >>>>> become your bottleneck in HBase. > >>>>> Is it possible for you to avoid RO and control the writes/updates > from > >>>>> client side? > >>>>> > >>>>> Thanks, > >>>>> Anil Gupta > >>>>> > >>>>> > >>>>> On Fri, Oct 11, 2013 at 6:06 PM, John Weatherford < > >>>>> [email protected]> wrote: > >>>>> > >>>>>> OP Here :) > >>>>>> > >>>>>> Our current design involves a Region Observer on a table that does > >>>>>> increments on a second table. We took the approach that Michael said > >> and > >>>>>> inside the RO, we got a new connection and everything. We believe > this > >>>> is > >>>>>> causing deadlocks for us. Our next attempt is going to be writing to > >>>>>> another row in the same table where we will store the increments. If > >>>> this > >>>>>> doesn't work, we are going to simply pull the increments out of the > RO > >>>> and > >>>>>> do them in the application or in Flume. > >>>>>> > >>>>>> @Tom Brown > >>>>>> I would be very interested to hear more about your solution of > >>>>>> aggregating the increments in another system that is then > responsible > >>>> for > >>>>>> updating in Hbase. > >>>>>> > >>>>>> -jW > >>>>>> > >>>>>> > >>>>>> On Fri 11 Oct 2013 10:26:58 AM PDT, Vladimir Rodionov wrote: > >>>>>> > >>>>>>> With respect to the OP's design… does the deadlock occur because > he's > >>>>>>>>> trying to update a column in a different row within the same > table? > >>>>>>>>> > >>>>>>>> > >>>>>>> Because he is trying to update *row* in a different Region (and > >>>>>>> potentially in different RS). > >>>>>>> > >>>>>>> Best regards, > >>>>>>> Vladimir Rodionov > >>>>>>> Principal Platform Engineer > >>>>>>> Carrier IQ, www.carrieriq.com > >>>>>>> e-mail: [email protected] > >>>>>>> > >>>>>>> ______________________________**__________ > >>>>>>> From: Michael Segel [[email protected]] > >>>>>>> Sent: Friday, October 11, 2013 9:10 AM > >>>>>>> To: [email protected] > >>>>>>> Cc: Vladimir Rodionov > >>>>>>> Subject: Re: Coprocessor Increments > >>>>>>> > >>>>>>> > >>>>>>> Confidentiality Notice: The information contained in this message, > >>>>>>> including any attachments hereto, may be confidential and is > intended > >>>> to be > >>>>>>> read only by the individual or entity to whom this message is > >>>> addressed. If > >>>>>>> the reader of this message is not the intended recipient or an > agent > >> or > >>>>>>> designee of the intended recipient, please note that any review, > use, > >>>>>>> disclosure or distribution of this message or its attachments, in > any > >>>> form, > >>>>>>> is strictly prohibited. If you have received this message in > error, > >>>> please > >>>>>>> immediately notify the sender and/or [email protected] > >>>>>>> delete or destroy any copy of this message and its attachments. > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Thanks & Regards, > >>>>> Anil Gupta > >>>> > >>>> > >>> > >>> > >>> -- > >>> Thanks & Regards, > >>> Anil Gupta > >> > >> > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > -- Thanks & Regards, Anil Gupta
