Shawn,
For the upsert without limit, which will read the source table and write the target tables on the server side. I think the higher memory usage is caused by using scan cache and memstore under the higher throughput. ---------------------------------------- Jaanai Zhang Best regards! Shawn Li <shawnli...@gmail.com> 于2018年12月13日周四 上午10:13写道: > Hi Vincent, > > So you describe limit will sent result to client side and then write to > server, this might explain why upsert with limit is slower than without > limit. But looks like it can't explain the memory usage? The memory usage > on client machine is 8gb (without "limit") vs 2gb (with limit), sometime > upsert without "limit" can even reach 20gb for big table. > > Thanks, > Shawn > > On Wed, Dec 12, 2018 at 6:34 PM Vincent Poon <vincentp...@apache.org> > wrote: > >> I think it's done client-side if you have LIMIT. If you have e.g. LIMIT >> 1000 , it would be incorrect for each regionserver to upsert 100, if you >> have more than one regionserver. So instead results are sent back to the >> client, where the LIMIT is applied and then written back to the server in >> the UPSERT. >> >> On Wed, Dec 12, 2018 at 1:18 PM Shawn Li <shawnli...@gmail.com> wrote: >> >>> Hi Vincent, >>> >>> >>> >>> The table creation statement is similar to below. We have about 200 >>> fields. Table is mutable and don’t have any index on the table. >>> >>> >>> >>> CREATE TABLE IF NOT EXISTS us_population ( >>> >>> state CHAR(2) NOT NULL, >>> >>> city VARCHAR, >>> >>> population BIGINT, >>> >>> … >>> >>> CONSTRAINT my_pk PRIMARY KEY (state)); >>> >>> >>> >>> Thanks, >>> >>> Shawn >>> >>> >>> >>> On Wed, Dec 12, 2018, 13:42 Vincent Poon <vincentp...@apache.org wrote: >>> >>>> For #2, can you provide the table definition and the statement used? >>>> e.g. Is the table immutable, or does it have indexes? >>>> >>>> On Tue, Dec 11, 2018 at 6:08 PM Shawn/Xiang Li <shawnli...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> 1. Want to check what is underlying running for limit clause >>>>> used in the following Upsert statement (is it involving any coprocessor >>>>> working behind?): >>>>> >>>>> >>>>> >>>>> * upsert into table2 select * from >>>>> table1 limit 3000000; * (table 1 and table 2 have same schema) >>>>> >>>>> >>>>> >>>>> The above statement is running a lot slower than without >>>>> “limit” clause as shown in following, even the above statement upsert >>>>> less >>>>> data: >>>>> >>>>> >>>>> >>>>> * upsert into table2 select * from >>>>> table1;* >>>>> >>>>> >>>>> >>>>> 2. We also observe memory usable is pretty high without the >>>>> limit clause (8gb vs 2gb), sometimes for big table it can reach 20gb >>>>> without using limit clause. According to phoenix website description for >>>>> upsert select “If auto commit is on, and both a) the target table matches >>>>> the source table, and b) the select performs no aggregation, then the >>>>> population of the target table will be done completely on the server-side >>>>> (with constraint violations logged, but otherwise ignored).” >>>>> >>>>> >>>>> >>>>> My question is If everything is done on server-side, >>>>> how come we have such high memory usage on the client machine? >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Shawn >>>>> >>>>