bq. calculating another new attributes of a trade Can you put the new attributes in separate columns ?
Cheers On Fri, Mar 25, 2016 at 12:38 PM, Daniel Połaczański <dpolaczan...@gmail.com > wrote: > The data is set of trades and the processing is some kind of enrichment > (calculating another new attributes of a trade). All attributes are needed > (the original and new) > > 2016-03-25 18:41 GMT+01:00 Ted Yu <yuzhih...@gmail.com>: > > > bq. During the processing the size of the data is doubled. > > > > This explains the frequent split :-) > > > > Is the original data needed after post-processing (maybe for auditing) ? > > > > Cheers > > > > On Fri, Mar 25, 2016 at 10:32 AM, Daniel Połaczański < > > dpolaczan...@gmail.com > > > wrote: > > > > > I am testing different solutions (POC). > > > The region size currenlty is 32MB (I know it should be >= 1GB, but we > are > > > testing different solutions with smaller amount of the data ). So > > > increasing region size is not a solution. Our problems can happen even > > when > > > a region will be 1 GB. We want to proces the data with coprocessor and > > > hadoop map reduce. I can not have one big Region because I want > sensible > > > degree of paralerism (with Map Reduce and coprocessors). > > > > > > Increasing region size + pre-splitting is not an option as well > because > > I > > > know nothing about keys(random long). > > > > > > During the processing the size of the data is doubled. > > > > > > And yes, coprocessor rewrites a lot of the data written into the table. > > The > > > whole record is serialized to avro and stored in one column (storing > > single > > > attribute in single column we will try in the next POC) > > > > > > it is not a typical big data project where we can allow former analysis > > of > > > the data:) > > > > > > 2016-03-25 17:38 GMT+01:00 Ted Yu <yuzhih...@gmail.com>: > > > > > > > What's the current region size you use ? > > > > > > > > bq. During the processing size of the data gets increased > > > > > > > > Can you give us some quantitative measure as to how much increase you > > > > observed (w.r.t. region size) ? > > > > > > > > bq. I was looking for some "global lock" in source code > > > > > > > > Probably not a good idea using global lock. > > > > > > > > I am curious, looks like your coprocesser may rewrite a lot of data > > > written > > > > into the table. > > > > Can client side accommodate such logic so that the rewrite is > reduced ? > > > > > > > > Thanks > > > > > > > > On Fri, Mar 25, 2016 at 8:55 AM, Daniel Połaczański < > > > > dpolaczan...@gmail.com> > > > > wrote: > > > > > > > > > Hi, > > > > > I have some processing in my coprocesserService which modifies the > > > > existing > > > > > data in place. It iterates over every row, modifies and puts it > back > > to > > > > > region. The table can be modified by only one client. > > > > > > > > > > During the processing size of the data gets increased -> region's > > size > > > > get > > > > > increased -> region's split happens. It makes that the processing > is > > > > > stopped by exception NotServingRegionException (because region is > > > closed > > > > > and splited to two new regions so it is closed and doesn't exist > > > > anymore). > > > > > > > > > > Is there any clean way to block Region's splitting? > > > > > > > > > > I was looking for some "global lock" in source code but I haven't > > found > > > > > anything helpfull. > > > > > Another idea is to create custom RegionSplitPolicy and explicilty > set > > > > some > > > > > Flag which will return false in shouldSplit(), but I'm not sure yet > > if > > > it > > > > > is safe. > > > > > Could you advise? > > > > > Regards > > > > > > > > > > > > > > >