Hi prasad, Could you please review my PR in your case?
- https://github.com/apache/incubator-zeppelin/pull/517 Regards, Jongyoul On Sun, Dec 6, 2015 at 7:44 PM, Jongyoul Lee <[email protected]> wrote: > Hi Prasad, > > Thanks for the detail review of it. I also figure out multiple connection > would make sense. I create > https://issues.apache.org/jira/browse/ZEPPELIN-487 and I will fix it. > > Regards, > Jongyoul > > On Sun, Dec 6, 2015 at 6:05 AM, Prasad Wagle <[email protected]> > wrote: > >> Hi Jongyoul, >> >> I ran experiments below and found that we need multiple connections for >> parallel execution in Vertica. >> >> *Same Connection* >> scala> val connection = DriverManager.getConnection(url, username, >> password) >> connection: java.sql.Connection = >> com.vertica.jdbc.VerticaConnectionImpl@5d56ea0f >> >> scala> val thread1 = new Thread { >> | override def run { >> | val statement = connection.createStatement() >> | println("Thread1: before execute at " + >> Calendar.getInstance().getTime()) >> | val resultSet = statement.executeQuery("SELECT sleep(10) as >> col1, 1 as col2") >> | resultSet.next() >> | val col1 = resultSet.getString("col1") >> | val col2 = resultSet.getString("col2") >> | resultSet.next() >> | println("Thread1: col " + col1 + " " + col2 + " at " + >> Calendar.getInstance().getTime()) >> | } >> | } >> thread1: Thread = Thread[Thread-3,5,main] >> >> scala> val thread2 = new Thread { >> | override def run { >> | val statement = connection.createStatement() >> | println("Thread2: before execute at " + >> Calendar.getInstance().getTime()) >> | val resultSet = statement.executeQuery("SELECT sleep(10) as >> col1, 2 as col2") >> | resultSet.next() >> | val col1 = resultSet.getString("col1") >> | val col2 = resultSet.getString("col2") >> | resultSet.next() >> | println("Thread2: col " + col1 + " " + col2 + " at " + >> Calendar.getInstance().getTime()) >> | } >> | } >> thread2: Thread = Thread[Thread-4,5,main] >> >> scala> thread1.start >> >> scala> thread2.start >> Thread1: before execute at Sat Dec 05 12:38:59 PST 2015 >> Thread2: before execute at Sat Dec 05 12:38:59 PST 2015 >> Thread1: col 0 1 at Sat Dec 05 12:39:19 PST 2015 >> Thread2: col 0 2 at Sat Dec 05 12:39:19 PST 2015 >> >> *Two Connections* >> >> scala> val thread1 = new Thread { >> | override def run { >> | val connection = DriverManager.getConnection(url, username, >> password) >> | val statement = connection.createStatement() >> | println("Thread1: before execute at " + >> Calendar.getInstance().getTime()) >> | val resultSet = statement.executeQuery("SELECT sleep(10) as >> col1, 1 as col2") >> | resultSet.next() >> | val col1 = resultSet.getString("col1") >> | val col2 = resultSet.getString("col2") >> | println("Thread1: col " + col1 + " " + col2 + " at " + >> Calendar.getInstance().getTime()) >> | } >> | } >> thread1: Thread = Thread[Thread-3,5,main] >> >> scala> val thread2 = new Thread { >> | override def run { >> | val connection = DriverManager.getConnection(url, username, >> password) >> | val statement = connection.createStatement() >> | println("Thread2: before execute at " + >> Calendar.getInstance().getTime()) >> | val resultSet = statement.executeQuery("SELECT sleep(10) as >> col1, 2 as col2") >> | resultSet.next() >> | val col1 = resultSet.getString("col1") >> | val col2 = resultSet.getString("col2") >> | println("Thread2: col " + col1 + " " + col2 + " at " + >> Calendar.getInstance().getTime()) >> | } >> | } >> thread2: Thread = Thread[Thread-4,5,main] >> >> scala> thread1.start >> >> scala> thread2.start >> >> Thread2: before execute at Sat Dec 05 12:34:16 PST 2015 >> Thread1: before execute at Sat Dec 05 12:34:16 PST 2015 >> Thread1: col 0 1 at Sat Dec 05 12:34:26 PST 2015 >> Thread2: col 0 2 at Sat Dec 05 12:34:26 PST 2015 >> >> >> Does that make sense? >> >> Thanks for the great work! >> >> Prasad >> >> On Fri, Dec 4, 2015 at 6:18 PM, Jongyoul Lee <[email protected]> wrote: >> >>> Hi Prasad, >>> >>> First of all, Thanks for testing multiple executions and documenting the >>> bug. I'll fix the bug. Concerning the concurrent executions, that >>> interpreter make a two statements from one connection. I'm not good at >>> vertica, could you please confirm supporting multi statements from one >>> connection and verify running multiple statements from one connection? >>> Moreover, I appreciate the advice which is better strategy between multiple >>> statements and multiple connections. I expect that second query is started >>> before finishing the first one as you can see in your log, In your case, >>> >>> 1. the first query started >>> 2. the second query started >>> 3. the second query ended >>> 4. the first query ended >>> >>> I, however why those queries finished after 2 minutes. I guess it's >>> about vertica's behavior of jdbc, and it doesn't seems that vertica runs >>> multiple statements in one connection. with this situation, even though two >>> statements try to execute with parallel, sequentially executes internally. >>> If it's true, I'll change to use multiple connections. >>> >>> Regards, >>> Jongyoul >>> >>> On Sat, Dec 5, 2015 at 3:25 AM, Prasad Wagle <[email protected]> >>> wrote: >>> >>>> Hi Jongyoul, >>>> >>>> Thanks for merging ZEPPELIN-440. I downloaded changes from >>>> https://github.com/apache/incubator-zeppelin/pull/455 and here's what >>>> I found. >>>> >>>> When I ran two Vertica queries, I got the error below: >>>> ERROR [2015-12-04 09:18:47,153] ({pool-2-thread-3} Job.java[run]:183) - >>>> Job failed >>>> java.lang.AbstractMethodError: >>>> com.vertica.jdbc.VerticaConnectionImpl.isValid(I)Z >>>> at >>>> org.apache.zeppelin.hive.HiveInterpreter.getConnection(HiveInterpreter.java:163) >>>> at >>>> org.apache.zeppelin.hive.HiveInterpreter.getStatement(HiveInterpreter.java:196) >>>> at >>>> org.apache.zeppelin.hive.HiveInterpreter.executeSql(HiveInterpreter.java:208) >>>> at >>>> org.apache.zeppelin.hive.HiveInterpreter.interpret(HiveInterpreter.java:292) >>>> at >>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>>> at >>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>>> at >>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300) >>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>> at >>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:744) >>>> >>>> When I removed the connection.isValid check, the queries ran without >>>> any errors. >>>> if (connection.isClosed()) { // || !connection.isValid(10)) { >>>> >>>> However, I noticed that even though the jobs for the two queries >>>> (select sleep(60)) were started around the same time, they both finished >>>> 120 seconds after they were started. I also looked at the sessions in >>>> Vertica and found that 'select sleep(60), 4' was started after 'select >>>> sleep(60), 3' completed. Is this behavior expected? Our goal is to prevent >>>> one long Vertica query from delaying other Vertica queries. >>>> >>>> MacBook-Pro logs $ cat >>>> zeppelin-interpreter-hive-pwagle-MacBook-Pro.local.log >>>> >>>> INFO [2015-12-04 09:28:44,545] ({Thread-0} >>>> RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server >>>> on port 54257 >>>> >>>> INFO [2015-12-04 09:28:44,818] ({pool-1-thread-3} >>>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate >>>> interpreter org.apache.zeppelin.hive.HiveInterpreter >>>> >>>> INFO [2015-12-04 09:28:44,862] ({pool-2-thread-2} >>>> SchedulerFactory.java[jobStarted]:132) - Job >>>> remoteInterpretJob_1449250124861 started by scheduler >>>> org.apache.zeppelin.hive.HiveInterpreter1131923628 >>>> >>>> ERROR [2015-12-04 09:28:44,865] ({pool-2-thread-2} >>>> HiveInterpreter.java[open]:128) - hive will be ignored. driver.hive and >>>> hive.url is mandatory. >>>> >>>> INFO [2015-12-04 09:28:44,865] ({pool-2-thread-2} >>>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command: >>>> 'select sleep(60), 3' >>>> >>>> INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3} >>>> SchedulerFactory.java[jobStarted]:132) - Job >>>> remoteInterpretJob_1449250133312 started by scheduler >>>> org.apache.zeppelin.hive.HiveInterpreter1131923628 >>>> >>>> INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3} >>>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command: >>>> 'select sleep(60), 4' >>>> >>>> INFO [2015-12-04 09:30:48,970] ({pool-2-thread-3} >>>> SchedulerFactory.java[jobFinished]:138) - Job >>>> remoteInterpretJob_1449250133312 finished by scheduler >>>> org.apache.zeppelin.hive.HiveInterpreter1131923628 >>>> >>>> INFO [2015-12-04 09:30:48,970] ({pool-2-thread-2} >>>> SchedulerFactory.java[jobFinished]:138) - Job >>>> remoteInterpretJob_1449250124861 finished by scheduler >>>> org.apache.zeppelin.hive.HiveInterpreter1131923628 >>>> >>>> >>>> Thanks for your help! >>>> >>>> Prasad >>>> >>>> On Thu, Dec 3, 2015 at 9:12 PM, Jongyoul Lee <[email protected]> >>>> wrote: >>>> >>>>> Hi Prasad, >>>>> >>>>> PostgresSqlInterpreter doesn't seems to support parallel execution. I >>>>> merged ZEPPELIN-440 for supporting it today. Could you please use this >>>>> interpreter? You can find docs and changes in >>>>> https://github.com/apache/incubator-zeppelin/pull/455 >>>>> >>>>> Regards, >>>>> Jongyoul >>>>> >>>>> On Fri, Dec 4, 2015 at 10:32 AM, Prasad Wagle <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> We are evaluating Zeppelin and are excited about its potential. Right >>>>>> now, we are using it to access Presto and Vertica. We downloaded >>>>>> https://github.com/apache/incubator-zeppelin/pull/68 for the Presto >>>>>> interpreter and are using PostgreSqlInterpreter to access Vertica. >>>>>> >>>>>> The main issue we are facing is related to serial execution of >>>>>> queries. Since the default scheduler is FIFOScheduler, when a Vertica >>>>>> query >>>>>> is running in one notebook, all other Vertica queries issued from the >>>>>> same >>>>>> or different notes are pending. >>>>>> >>>>>> We changed the getScheduler method in PostgreSqlInterpreter to return >>>>>> ParallelScheduler. >>>>>> return SchedulerFactory.singleton().createOrGetParallelScheduler( >>>>>> PostgreSqlInterpreter.class.getName() + this.hashCode(), >>>>>> 2); >>>>>> >>>>>> But this resulted in the exception below. Is this expected behavior? >>>>>> Does the PostgreSqlInterpreter have to be redesigned to support parallel >>>>>> execution of queries? >>>>>> >>>>>> In zeppelin-interpreter-psql-pwagle-MacBook-Pro.local.log: >>>>>> >>>>>> INFO [2015-12-03 10:33:35,891] ({pool-1-thread-2} >>>>>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate >>>>>> interpreter org.apache.zeppelin.postgresql.PostgreSqlInterpreter >>>>>> INFO [2015-12-03 10:33:35,940] ({pool-2-thread-2} >>>>>> SchedulerFactory.java[jobStarted]:132) - Job >>>>>> remoteInterpretJob_1449167615939 started by scheduler >>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>>>> INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2} >>>>>> PostgreSqlInterpreter.java[open]:132) - Open psql connection! >>>>>> INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2} >>>>>> PostgreSqlInterpreter.java[close]:182) - Close psql connection! >>>>>> INFO [2015-12-03 10:33:36,511] ({pool-2-thread-2} >>>>>> SqlCompleter.java[getSqlKeywordsCompletions]:144) - JDBC >>>>>> DriverName:/vertica.jar-sql.keywords >>>>>> INFO [2015-12-03 10:33:37,779] ({pool-2-thread-3} >>>>>> SchedulerFactory.java[jobStarted]:132) - Job >>>>>> remoteInterpretJob_1449167617779 started by scheduler >>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>>>> INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2} >>>>>> PostgreSqlInterpreter.java[open]:153) - Successfully created psql >>>>>> connection >>>>>> INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2} >>>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command ' >>>>>> select sleep(30), 1;' >>>>>> INFO [2015-12-03 10:33:49,352] ({pool-2-thread-3} >>>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command ' >>>>>> select sleep(30), 2;' >>>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-3} >>>>>> PostgreSqlInterpreter.java[executeSql]:271) - Cannot run >>>>>> select sleep(30), 2; >>>>>> java.sql.SQLException: [Vertica][JDBC](12060) ResultSet object has >>>>>> been closed. >>>>>> at com.vertica.jdbc.SForwardResultSet.checkIfOpen(Unknown Source) >>>>>> at com.vertica.jdbc.SForwardResultSet.next(Unknown Source) >>>>>> at >>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:235) >>>>>> at >>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289) >>>>>> at >>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>>>>> at >>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>>>>> at >>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299) >>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>>>> at >>>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157) >>>>>> at >>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>> at >>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>>>> at >>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-2} Job.java[run]:183) >>>>>> - Job failed >>>>>> org.apache.zeppelin.interpreter.InterpreterException: >>>>>> java.lang.NullPointerException >>>>>> at >>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:61) >>>>>> at >>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>>>>> at >>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299) >>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>>>> at >>>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157) >>>>>> at >>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>> at >>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>>>> at >>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>> Caused by: java.lang.NullPointerException >>>>>> at >>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:262) >>>>>> at >>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289) >>>>>> at >>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>>>>> ... 11 more >>>>>> INFO [2015-12-03 10:34:49,489] ({pool-2-thread-3} >>>>>> SchedulerFactory.java[jobFinished]:138) - Job >>>>>> remoteInterpretJob_1449167617779 finished by scheduler >>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>>>> INFO [2015-12-03 10:34:49,490] ({pool-2-thread-2} >>>>>> SchedulerFactory.java[jobFinished]:138) - Job >>>>>> remoteInterpretJob_1449167615939 finished by scheduler >>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Prasad >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> 이종열, Jongyoul Lee, 李宗烈 >>>>> http://madeng.net >>>>> >>>> >>>> >>> >>> >>> -- >>> 이종열, Jongyoul Lee, 李宗烈 >>> http://madeng.net >>> >> >> >> >> -- >> Prasad Wagle >> 408.476.6261 | twitter.com/prasadwagle | [email protected] | >> www.linkedin.com/in/prasadwagle >> > > > > -- > 이종열, Jongyoul Lee, 李宗烈 > http://madeng.net > -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net
