Serega, I haven't seen any issues so far with this approach (keeping connections open in thread-local, one conn per thread)
we send a stream of kafka messages to hbase in the following way - each "saver" thread initializes two connections - one for upserts and one for alter statements (for multitenancy each thread keeps a local Map <String, Connection> with one connection per tenant) - get message from inbound kafka queue, deserialize (avro) and save to hbase with java.sql.preparedstatement and phoenix upsert or create/alter* - connection.commit() every 1000 or so upserts or per "transaction", with connection autocommit set to false (this brings down latency per row saved to single digit msec) - in case of exceptions retry save, if it still fails - check/refresh connections *We pause upserts in all threads (thread.yield on detecting lock) if there is incoming message with metadata change (CREATE/ALTER table) otherwise phoenix would throw an exception Alex On Sun, Sep 6, 2015 at 12:36 PM, Serega Sheypak <[email protected]> wrote: > Hi, approach above doesn't fit web-app. There are multiple simultaneous > upserts comes from different threads. > So the only thing is to put connection into thread-local and use one > connection per thread. > > try (Connection conn = DriverManager.getConnection(url)) { > conn.setAutoCommit(false); > int batchSize = 0; > int commitSize = 1000; // number of rows you want to commit per batch. > Change this value according to your needs. > try (Statement stmt = conn.prepareStatement(upsert)) { > stmt.set ... > while (there are records to upsert) { > stmt.executeUpdate(); > batchSize++; > if (batchSize % commitSize == 0) { > conn.commit(); > } > } > conn.commit(); // commit the last batch of records > } > > > 2015-09-03 22:28 GMT+02:00 Jaime Solano <[email protected]>: > >> Thanks for the explanation, Samarth! >> >> On Thu, Sep 3, 2015 at 4:11 PM, Samarth Jain <[email protected]> wrote: >> >>> Hi Jaime, >>> >>> Phoenix is an embedded JDBC driver i.e. the connections are not meant to >>> be pooled. Creating phoenix connections is a light weight operation. This >>> is because all phoenix connections to a cluster from the Phoenix JDBC >>> driver use the same underlying HConnection. >>> >>> If you end up pooling Phoenix connections then you will likely run into >>> unexpected issues. It is likely that a request will leave a Phoenix >>> connection not in a kosher state for the next request to use it. >>> >>> On Thu, Sep 3, 2015 at 12:58 PM, Jaime Solano <[email protected]> >>> wrote: >>> >>>> Hi Samarth, >>>> In our project we were thinking about using Tomcat JDBC Connection >>>> Pool, to handle Phoenix connections. You're saying this type of approach >>>> should be avoided? What other approach should be followed if, in the >>>> scenario Serega described, you experience a heavy load of users, all trying >>>> to upsert at the same time (therefore, a lot of connections) ?? >>>> Also, can you expand a little bit more on the implications of having a >>>> pooling mechanism for Phoenix connections? >>>> Thanks in advance! >>>> -Jaime >>>> >>>> On Thu, Sep 3, 2015 at 3:35 PM, Samarth Jain <[email protected]> >>>> wrote: >>>> >>>>> Yes. PhoenixConnection implements java.sql.Connection. >>>>> >>>>> On Thu, Sep 3, 2015 at 12:34 PM, Serega Sheypak < >>>>> [email protected]> wrote: >>>>> >>>>>> >Phoenix doesn't cache connections. You shouldn't pool them and you >>>>>> shouldn't share them with multiple threads. >>>>>> We are talking about java.sql.Connection, right? >>>>>> >>>>>> 2015-09-03 21:26 GMT+02:00 Samarth Jain <[email protected]>: >>>>>> >>>>>>> Your pattern is correct. >>>>>>> >>>>>>> Phoenix doesn't cache connections. You shouldn't pool them and you >>>>>>> shouldn't share them with multiple threads. >>>>>>> >>>>>>> For batching upserts, you could do something like this: >>>>>>> >>>>>>> You can do this via phoenix by doing something like this: >>>>>>> >>>>>>> try (Connection conn = DriverManager.getConnection(url)) { >>>>>>> conn.setAutoCommit(false); >>>>>>> int batchSize = 0; >>>>>>> int commitSize = 1000; // number of rows you want to commit per >>>>>>> batch. Change this value according to your needs. >>>>>>> try (Statement stmt = conn.prepareStatement(upsert)) { >>>>>>> stmt.set ... >>>>>>> while (there are records to upsert) { >>>>>>> stmt.executeUpdate(); >>>>>>> batchSize++; >>>>>>> if (batchSize % commitSize == 0) { >>>>>>> conn.commit(); >>>>>>> } >>>>>>> } >>>>>>> conn.commit(); // commit the last batch of records >>>>>>> } >>>>>>> >>>>>>> You don't want commitSize to be too large since Phoenix client keeps >>>>>>> the uncommitted rows in memory till they are sent over to HBase. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 3, 2015 at 12:19 PM, Serega Sheypak < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, I'm using phoenix in java web-application. App does upsert or >>>>>>>> select by primary key. >>>>>>>> What is the right pattern to do it? >>>>>>>> - I create new connection for each request >>>>>>>> - prepare and execute statement >>>>>>>> - close stmt >>>>>>>> - close connection >>>>>>>> >>>>>>>> Does phoenix caches connections internally? What is the right way >>>>>>>> to batch upserts in current case? >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
