bq. During the processing the size of the data is doubled.

This explains the frequent split :-)

Is the original data needed after post-processing (maybe for auditing) ?

Cheers

On Fri, Mar 25, 2016 at 10:32 AM, Daniel Połaczański <dpolaczan...@gmail.com
> wrote:

> I am testing different solutions (POC).
> The region size currenlty is 32MB (I know it should be >= 1GB, but we are
> testing different solutions with smaller amount of the data ). So
> increasing region size is not a solution. Our problems can happen even when
> a region will be 1 GB. We want to proces the data with coprocessor and
> hadoop map reduce. I can not have one big Region because I want sensible
> degree of paralerism (with Map Reduce and coprocessors).
>
> Increasing region size + pre-splitting  is not an option as well because I
> know nothing about keys(random long).
>
> During the processing the size of the data is doubled.
>
> And yes, coprocessor rewrites a lot of the data written into the table. The
> whole record is serialized to avro and stored in one column (storing single
> attribute in single column we will try in the next POC)
>
> it is not a typical big data project where we can allow former analysis of
> the data:)
>
> 2016-03-25 17:38 GMT+01:00 Ted Yu <yuzhih...@gmail.com>:
>
> > What's the current region size you use ?
> >
> > bq. During the processing size of the data gets increased
> >
> > Can you give us some quantitative measure as to how much increase you
> > observed (w.r.t. region size) ?
> >
> > bq. I was looking for some "global lock" in source code
> >
> > Probably not a good idea using global lock.
> >
> > I am curious, looks like your coprocesser may rewrite a lot of data
> written
> > into the table.
> > Can client side accommodate such logic so that the rewrite is reduced ?
> >
> > Thanks
> >
> > On Fri, Mar 25, 2016 at 8:55 AM, Daniel Połaczański <
> > dpolaczan...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > I have some processing in my coprocesserService which modifies the
> > existing
> > > data in place. It iterates over every row, modifies and puts it back to
> > > region. The table can be modified by only one client.
> > >
> > > During the processing size of the data gets increased -> region's size
> > get
> > > increased -> region's split happens. It makes that the processing is
> > > stopped by exception NotServingRegionException (because region is
> closed
> > > and splited to two new regions so it is closed and doesn't exist
> > anymore).
> > >
> > > Is there any clean way to block Region's splitting?
> > >
> > > I was looking for some "global lock" in source code but I haven't found
> > > anything helpfull.
> > > Another idea is to create custom RegionSplitPolicy and explicilty set
> > some
> > > Flag which will return false in shouldSplit(), but I'm not sure yet if
> it
> > > is safe.
> > > Could you advise?
> > > Regards
> > >
> >
>

Reply via email to