yes On Mon, Sep 7, 2015 at 4:40 AM, Serega Sheypak <[email protected]> wrote:
> Hi, so you hold phoenix java.sql.Connection for each thread as > thread-local variable and don't get any problems, correct? > > 2015-09-07 6:43 GMT+02:00 ALEX K <[email protected]>: > >> Serega, >> >> I haven't seen any issues so far with this approach (keeping connections >> open in thread-local, one conn per thread) >> >> we send a stream of kafka messages to hbase in the following way >> - each "saver" thread initializes two connections - one for upserts and >> one for alter statements (for multitenancy each thread keeps a local Map >> <String, Connection> with one connection per tenant) >> - get message from inbound kafka queue, deserialize (avro) and save to >> hbase with java.sql.preparedstatement and phoenix upsert or create/alter* >> - connection.commit() every 1000 or so upserts or per "transaction", with >> connection autocommit set to false (this brings down latency per row saved >> to single digit msec) >> - in case of exceptions retry save, if it still fails - check/refresh >> connections >> >> *We pause upserts in all threads (thread.yield on detecting lock) if >> there is incoming message with metadata change (CREATE/ALTER table) >> otherwise phoenix would throw an exception >> >> Alex >> >> On Sun, Sep 6, 2015 at 12:36 PM, Serega Sheypak <[email protected] >> > wrote: >> >>> Hi, approach above doesn't fit web-app. There are multiple simultaneous >>> upserts comes from different threads. >>> So the only thing is to put connection into thread-local and use one >>> connection per thread. >>> >>> try (Connection conn = DriverManager.getConnection(url)) { >>> conn.setAutoCommit(false); >>> int batchSize = 0; >>> int commitSize = 1000; // number of rows you want to commit per batch. >>> Change this value according to your needs. >>> try (Statement stmt = conn.prepareStatement(upsert)) { >>> stmt.set ... >>> while (there are records to upsert) { >>> stmt.executeUpdate(); >>> batchSize++; >>> if (batchSize % commitSize == 0) { >>> conn.commit(); >>> } >>> } >>> conn.commit(); // commit the last batch of records >>> } >>> >>> >>> 2015-09-03 22:28 GMT+02:00 Jaime Solano <[email protected]>: >>> >>>> Thanks for the explanation, Samarth! >>>> >>>> On Thu, Sep 3, 2015 at 4:11 PM, Samarth Jain <[email protected]> >>>> wrote: >>>> >>>>> Hi Jaime, >>>>> >>>>> Phoenix is an embedded JDBC driver i.e. the connections are not meant >>>>> to be pooled. Creating phoenix connections is a light weight operation. >>>>> This is because all phoenix connections to a cluster from the Phoenix JDBC >>>>> driver use the same underlying HConnection. >>>>> >>>>> If you end up pooling Phoenix connections then you will likely run >>>>> into unexpected issues. It is likely that a request will leave a Phoenix >>>>> connection not in a kosher state for the next request to use it. >>>>> >>>>> On Thu, Sep 3, 2015 at 12:58 PM, Jaime Solano <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Samarth, >>>>>> In our project we were thinking about using Tomcat JDBC Connection >>>>>> Pool, to handle Phoenix connections. You're saying this type of approach >>>>>> should be avoided? What other approach should be followed if, in the >>>>>> scenario Serega described, you experience a heavy load of users, all >>>>>> trying >>>>>> to upsert at the same time (therefore, a lot of connections) ?? >>>>>> Also, can you expand a little bit more on the implications of having >>>>>> a pooling mechanism for Phoenix connections? >>>>>> Thanks in advance! >>>>>> -Jaime >>>>>> >>>>>> On Thu, Sep 3, 2015 at 3:35 PM, Samarth Jain <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Yes. PhoenixConnection implements java.sql.Connection. >>>>>>> >>>>>>> On Thu, Sep 3, 2015 at 12:34 PM, Serega Sheypak < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> >Phoenix doesn't cache connections. You shouldn't pool them and >>>>>>>> you shouldn't share them with multiple threads. >>>>>>>> We are talking about java.sql.Connection, right? >>>>>>>> >>>>>>>> 2015-09-03 21:26 GMT+02:00 Samarth Jain <[email protected]>: >>>>>>>> >>>>>>>>> Your pattern is correct. >>>>>>>>> >>>>>>>>> Phoenix doesn't cache connections. You shouldn't pool them and you >>>>>>>>> shouldn't share them with multiple threads. >>>>>>>>> >>>>>>>>> For batching upserts, you could do something like this: >>>>>>>>> >>>>>>>>> You can do this via phoenix by doing something like this: >>>>>>>>> >>>>>>>>> try (Connection conn = DriverManager.getConnection(url)) { >>>>>>>>> conn.setAutoCommit(false); >>>>>>>>> int batchSize = 0; >>>>>>>>> int commitSize = 1000; // number of rows you want to commit per >>>>>>>>> batch. Change this value according to your needs. >>>>>>>>> try (Statement stmt = conn.prepareStatement(upsert)) { >>>>>>>>> stmt.set ... >>>>>>>>> while (there are records to upsert) { >>>>>>>>> stmt.executeUpdate(); >>>>>>>>> batchSize++; >>>>>>>>> if (batchSize % commitSize == 0) { >>>>>>>>> conn.commit(); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> conn.commit(); // commit the last batch of records >>>>>>>>> } >>>>>>>>> >>>>>>>>> You don't want commitSize to be too large since Phoenix client >>>>>>>>> keeps the uncommitted rows in memory till they are sent over to HBase. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Sep 3, 2015 at 12:19 PM, Serega Sheypak < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi, I'm using phoenix in java web-application. App does upsert >>>>>>>>>> or select by primary key. >>>>>>>>>> What is the right pattern to do it? >>>>>>>>>> - I create new connection for each request >>>>>>>>>> - prepare and execute statement >>>>>>>>>> - close stmt >>>>>>>>>> - close connection >>>>>>>>>> >>>>>>>>>> Does phoenix caches connections internally? What is the right way >>>>>>>>>> to batch upserts in current case? >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
