Hi prasad,

Could you please review my PR in your case?

   - https://github.com/apache/incubator-zeppelin/pull/517

Regards,
Jongyoul

On Sun, Dec 6, 2015 at 7:44 PM, Jongyoul Lee <[email protected]> wrote:

> Hi Prasad,
>
> Thanks for the detail review of it. I also figure out multiple connection
> would make sense. I create
> https://issues.apache.org/jira/browse/ZEPPELIN-487 and I will fix it.
>
> Regards,
> Jongyoul
>
> On Sun, Dec 6, 2015 at 6:05 AM, Prasad Wagle <[email protected]>
> wrote:
>
>> Hi Jongyoul,
>>
>> I ran experiments below and found that we need multiple connections for
>> parallel execution in Vertica.
>>
>> *Same Connection*
>> scala> val connection = DriverManager.getConnection(url, username,
>> password)
>> connection: java.sql.Connection =
>> com.vertica.jdbc.VerticaConnectionImpl@5d56ea0f
>>
>> scala> val thread1 = new Thread {
>>      |   override def run {
>>      |     val statement = connection.createStatement()
>>      |     println("Thread1: before execute at " +
>> Calendar.getInstance().getTime())
>>      |     val resultSet = statement.executeQuery("SELECT sleep(10) as
>> col1, 1 as col2")
>>      |     resultSet.next()
>>      |     val col1 = resultSet.getString("col1")
>>      |     val col2 = resultSet.getString("col2")
>>      |     resultSet.next()
>>      |     println("Thread1: col " + col1 + " " + col2 + " at " +
>> Calendar.getInstance().getTime())
>>      |   }
>>      | }
>> thread1: Thread = Thread[Thread-3,5,main]
>>
>> scala> val thread2 = new Thread {
>>      |   override def run {
>>      |     val statement = connection.createStatement()
>>      |     println("Thread2: before execute at " +
>> Calendar.getInstance().getTime())
>>      |     val resultSet = statement.executeQuery("SELECT sleep(10) as
>> col1, 2 as col2")
>>      |     resultSet.next()
>>      |     val col1 = resultSet.getString("col1")
>>      |     val col2 = resultSet.getString("col2")
>>      |     resultSet.next()
>>      |     println("Thread2: col " + col1 + " " + col2 + " at " +
>> Calendar.getInstance().getTime())
>>      |   }
>>      | }
>> thread2: Thread = Thread[Thread-4,5,main]
>>
>> scala> thread1.start
>>
>> scala> thread2.start
>> Thread1: before execute at Sat Dec 05 12:38:59 PST 2015
>> Thread2: before execute at Sat Dec 05 12:38:59 PST 2015
>> Thread1: col 0 1 at Sat Dec 05 12:39:19 PST 2015
>> Thread2: col 0 2 at Sat Dec 05 12:39:19 PST 2015
>>
>> *Two Connections*
>>
>> scala> val thread1 = new Thread {
>>      |   override def run {
>>      |     val connection = DriverManager.getConnection(url, username,
>> password)
>>      |     val statement = connection.createStatement()
>>      |     println("Thread1: before execute at " +
>> Calendar.getInstance().getTime())
>>      |     val resultSet = statement.executeQuery("SELECT sleep(10) as
>> col1, 1 as col2")
>>      |     resultSet.next()
>>      |     val col1 = resultSet.getString("col1")
>>      |     val col2 = resultSet.getString("col2")
>>      |     println("Thread1: col " + col1 + " " + col2 + " at " +
>> Calendar.getInstance().getTime())
>>      |   }
>>      | }
>> thread1: Thread = Thread[Thread-3,5,main]
>>
>> scala> val thread2 = new Thread {
>>      |   override def run {
>>      |     val connection = DriverManager.getConnection(url, username,
>> password)
>>      |     val statement = connection.createStatement()
>>      |     println("Thread2: before execute at " +
>> Calendar.getInstance().getTime())
>>      |     val resultSet = statement.executeQuery("SELECT sleep(10) as
>> col1, 2 as col2")
>>      |     resultSet.next()
>>      |     val col1 = resultSet.getString("col1")
>>      |     val col2 = resultSet.getString("col2")
>>      |     println("Thread2: col " + col1 + " " + col2 + " at " +
>> Calendar.getInstance().getTime())
>>      |   }
>>      | }
>> thread2: Thread = Thread[Thread-4,5,main]
>>
>> scala> thread1.start
>>
>> scala> thread2.start
>>
>> Thread2: before execute at Sat Dec 05 12:34:16 PST 2015
>> Thread1: before execute at Sat Dec 05 12:34:16 PST 2015
>> Thread1: col 0 1 at Sat Dec 05 12:34:26 PST 2015
>> Thread2: col 0 2 at Sat Dec 05 12:34:26 PST 2015
>>
>>
>> Does that make sense?
>>
>> Thanks for the great work!
>>
>> Prasad
>>
>> On Fri, Dec 4, 2015 at 6:18 PM, Jongyoul Lee <[email protected]> wrote:
>>
>>> Hi Prasad,
>>>
>>> First of all, Thanks for testing multiple executions and documenting the
>>> bug. I'll fix the bug. Concerning the concurrent executions, that
>>> interpreter make a two statements from one connection. I'm not good at
>>> vertica, could you please confirm supporting multi statements from one
>>> connection and verify running multiple statements from one connection?
>>> Moreover, I appreciate the advice which is better strategy between multiple
>>> statements and multiple connections. I expect that second query is started
>>> before finishing the first one as you can see in your log, In your case,
>>>
>>> 1. the first query started
>>> 2. the second query started
>>> 3. the second query ended
>>> 4. the first query ended
>>>
>>> I, however why those queries finished after 2 minutes. I guess it's
>>> about vertica's behavior of jdbc, and it doesn't seems that vertica runs
>>> multiple statements in one connection. with this situation, even though two
>>> statements try to execute with parallel, sequentially executes internally.
>>> If it's true, I'll change to use multiple connections.
>>>
>>> Regards,
>>> Jongyoul
>>>
>>> On Sat, Dec 5, 2015 at 3:25 AM, Prasad Wagle <[email protected]>
>>> wrote:
>>>
>>>> Hi Jongyoul,
>>>>
>>>> Thanks for merging ZEPPELIN-440. I downloaded changes from
>>>> https://github.com/apache/incubator-zeppelin/pull/455 and here's what
>>>> I found.
>>>>
>>>> When I ran two Vertica queries, I got the error below:
>>>> ERROR [2015-12-04 09:18:47,153] ({pool-2-thread-3} Job.java[run]:183) -
>>>> Job failed
>>>> java.lang.AbstractMethodError:
>>>> com.vertica.jdbc.VerticaConnectionImpl.isValid(I)Z
>>>>         at
>>>> org.apache.zeppelin.hive.HiveInterpreter.getConnection(HiveInterpreter.java:163)
>>>>         at
>>>> org.apache.zeppelin.hive.HiveInterpreter.getStatement(HiveInterpreter.java:196)
>>>>         at
>>>> org.apache.zeppelin.hive.HiveInterpreter.executeSql(HiveInterpreter.java:208)
>>>>         at
>>>> org.apache.zeppelin.hive.HiveInterpreter.interpret(HiveInterpreter.java:292)
>>>>         at
>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>>>         at
>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>>>         at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
>>>>         at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>         at
>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157)
>>>>         at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>         at java.lang.Thread.run(Thread.java:744)
>>>>
>>>> When I removed the connection.isValid check, the queries ran without
>>>> any errors.
>>>>       if (connection.isClosed()) { // || !connection.isValid(10)) {
>>>>
>>>> However, I noticed that even though the jobs for the two queries
>>>> (select sleep(60)) were started around the same time, they both finished
>>>> 120 seconds after they were started. I also looked at the sessions in
>>>> Vertica and found that 'select sleep(60), 4' was started after 'select
>>>> sleep(60), 3' completed. Is this behavior expected? Our goal is to prevent
>>>> one long Vertica query from delaying other Vertica queries.
>>>>
>>>> MacBook-Pro logs $ cat
>>>> zeppelin-interpreter-hive-pwagle-MacBook-Pro.local.log
>>>>
>>>>  INFO [2015-12-04 09:28:44,545] ({Thread-0}
>>>> RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server
>>>> on port 54257
>>>>
>>>>  INFO [2015-12-04 09:28:44,818] ({pool-1-thread-3}
>>>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate
>>>> interpreter org.apache.zeppelin.hive.HiveInterpreter
>>>>
>>>>  INFO [2015-12-04 09:28:44,862] ({pool-2-thread-2}
>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>> remoteInterpretJob_1449250124861 started by scheduler
>>>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>>>
>>>> ERROR [2015-12-04 09:28:44,865] ({pool-2-thread-2}
>>>> HiveInterpreter.java[open]:128) - hive will be ignored. driver.hive and
>>>> hive.url is mandatory.
>>>>
>>>>  INFO [2015-12-04 09:28:44,865] ({pool-2-thread-2}
>>>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command:
>>>> 'select sleep(60), 3'
>>>>
>>>>  INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3}
>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>> remoteInterpretJob_1449250133312 started by scheduler
>>>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>>>
>>>>  INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3}
>>>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command:
>>>> 'select sleep(60), 4'
>>>>
>>>>  INFO [2015-12-04 09:30:48,970] ({pool-2-thread-3}
>>>> SchedulerFactory.java[jobFinished]:138) - Job
>>>> remoteInterpretJob_1449250133312 finished by scheduler
>>>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>>>
>>>>  INFO [2015-12-04 09:30:48,970] ({pool-2-thread-2}
>>>> SchedulerFactory.java[jobFinished]:138) - Job
>>>> remoteInterpretJob_1449250124861 finished by scheduler
>>>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>>>
>>>>
>>>> Thanks for your help!
>>>>
>>>> Prasad
>>>>
>>>> On Thu, Dec 3, 2015 at 9:12 PM, Jongyoul Lee <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Prasad,
>>>>>
>>>>> PostgresSqlInterpreter doesn't seems to support parallel execution. I
>>>>> merged ZEPPELIN-440 for supporting it today. Could you please use  this
>>>>> interpreter? You can find docs and changes in
>>>>> https://github.com/apache/incubator-zeppelin/pull/455
>>>>>
>>>>> Regards,
>>>>> Jongyoul
>>>>>
>>>>> On Fri, Dec 4, 2015 at 10:32 AM, Prasad Wagle <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We are evaluating Zeppelin and are excited about its potential. Right
>>>>>> now, we are using it to access Presto and Vertica. We downloaded
>>>>>> https://github.com/apache/incubator-zeppelin/pull/68 for the Presto
>>>>>> interpreter and are using PostgreSqlInterpreter to access Vertica.
>>>>>>
>>>>>> The main issue we are facing is related to serial execution of
>>>>>> queries. Since the default scheduler is FIFOScheduler, when a Vertica 
>>>>>> query
>>>>>> is running in one notebook, all other Vertica queries issued from the 
>>>>>> same
>>>>>> or different notes are pending.
>>>>>>
>>>>>> We changed the getScheduler method in PostgreSqlInterpreter to return
>>>>>> ParallelScheduler.
>>>>>>     return SchedulerFactory.singleton().createOrGetParallelScheduler(
>>>>>>             PostgreSqlInterpreter.class.getName() + this.hashCode(),
>>>>>> 2);
>>>>>>
>>>>>> But this resulted in the exception below. Is this expected behavior?
>>>>>> Does the PostgreSqlInterpreter have to be redesigned to support parallel
>>>>>> execution of queries?
>>>>>>
>>>>>> In zeppelin-interpreter-psql-pwagle-MacBook-Pro.local.log:
>>>>>>
>>>>>>  INFO [2015-12-03 10:33:35,891] ({pool-1-thread-2}
>>>>>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate
>>>>>> interpreter org.apache.zeppelin.postgresql.PostgreSqlInterpreter
>>>>>>  INFO [2015-12-03 10:33:35,940] ({pool-2-thread-2}
>>>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>>>> remoteInterpretJob_1449167615939 started by scheduler
>>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>>>  INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2}
>>>>>> PostgreSqlInterpreter.java[open]:132) - Open psql connection!
>>>>>>  INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2}
>>>>>> PostgreSqlInterpreter.java[close]:182) - Close psql connection!
>>>>>>  INFO [2015-12-03 10:33:36,511] ({pool-2-thread-2}
>>>>>> SqlCompleter.java[getSqlKeywordsCompletions]:144) - JDBC
>>>>>> DriverName:/vertica.jar-sql.keywords
>>>>>>  INFO [2015-12-03 10:33:37,779] ({pool-2-thread-3}
>>>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>>>> remoteInterpretJob_1449167617779 started by scheduler
>>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>>>  INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2}
>>>>>> PostgreSqlInterpreter.java[open]:153) - Successfully created psql 
>>>>>> connection
>>>>>>  INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2}
>>>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command '
>>>>>> select sleep(30), 1;'
>>>>>>  INFO [2015-12-03 10:33:49,352] ({pool-2-thread-3}
>>>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command '
>>>>>> select sleep(30), 2;'
>>>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-3}
>>>>>> PostgreSqlInterpreter.java[executeSql]:271) - Cannot run
>>>>>> select sleep(30), 2;
>>>>>> java.sql.SQLException: [Vertica][JDBC](12060) ResultSet object has
>>>>>> been closed.
>>>>>> at com.vertica.jdbc.SForwardResultSet.checkIfOpen(Unknown Source)
>>>>>> at com.vertica.jdbc.SForwardResultSet.next(Unknown Source)
>>>>>> at
>>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:235)
>>>>>> at
>>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299)
>>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>>> at
>>>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157)
>>>>>> at
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>>> at
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-2} Job.java[run]:183)
>>>>>> - Job failed
>>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>>> java.lang.NullPointerException
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:61)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299)
>>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>>> at
>>>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157)
>>>>>> at
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>>> at
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>>> Caused by: java.lang.NullPointerException
>>>>>> at
>>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:262)
>>>>>> at
>>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>>>>> ... 11 more
>>>>>>  INFO [2015-12-03 10:34:49,489] ({pool-2-thread-3}
>>>>>> SchedulerFactory.java[jobFinished]:138) - Job
>>>>>> remoteInterpretJob_1449167617779 finished by scheduler
>>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>>>  INFO [2015-12-03 10:34:49,490] ({pool-2-thread-2}
>>>>>> SchedulerFactory.java[jobFinished]:138) - Job
>>>>>> remoteInterpretJob_1449167615939 finished by scheduler
>>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Prasad
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>> http://madeng.net
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>>
>>
>>
>>
>> --
>> Prasad Wagle
>> 408.476.6261  |  twitter.com/prasadwagle | [email protected] |
>> www.linkedin.com/in/prasadwagle
>>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Reply via email to