I'm using the same solution as Samarth suggested (commit batching), it brings down latency per single row upsert from 50ms to 5ms (averaged after batching)
On Wed, Aug 19, 2015 at 7:11 PM, Samarth Jain <[email protected]> wrote: > You can do this via phoenix by doing something like this: > > try (Connection conn = DriverManager.getConnection(url)) { > conn.setAutoCommit(false); > int batchSize = 0; > int commitSize = 1000; // number of rows you want to commit per batch. > Change this value according to your needs. > while (there are records to upsert) { > stmt.executeUpdate(); > batchSize++; > if (batchSize % commitSize == 0) { > conn.commit(); > } > } > conn.commit(); // commit the last batch of records > > You don't want commitSize to be too large since Phoenix client keeps the > uncommitted rows in memory till they are sent over to HBase. > > > > On Wed, Aug 19, 2015 at 3:05 PM, Serega Sheypak <[email protected]> > wrote: > >> I would suggest you to use >> >> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/BufferedMutator.html >> instead of list of puts and share mutableBuffer across threads (it's >> thread-safe). I reduced my response time from 30-40 ms to 4ms while using >> buffferedmutator. It also sends mutations in async mode. :) >> >> I meet the same problem. Can't force Phoenix to buffer upserts on >> client-side and then send them to HBase in small batches. >> >> 2015-08-19 19:40 GMT+02:00 jeremy p <[email protected]>: >> >>> Hello all, >>> >>> I need to do true batch updates to a Phoenix table. By this, I mean >>> sending a bunch of updates to HBase as part of a single request. The HBase >>> API offers this behavior with the Table.put(List<Put> puts) method. I >>> noticed PhoenixStatement exposes an executeBatch() method, however, this >>> method just executes the batched statements one-by-one. This will not >>> deliver the performance that the HBase API exposes through their batch put >>> method. >>> >>> What is the best way for me to do true batch updates to a Phoenix >>> table? I need to do this programmatically, so I cannot use the command >>> line bulk insert utility. >>> >>> --Jeremy >>> >> >> >
