Thanks Nitin

There arent too many connections in close_wait state only 1 or two when we
run into this. Most likely its because of dropped connection.

I could not find any read or write timeouts we can set for the thrift
server which will tell thrift to hold on to the client connection.
 See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem
to have been implemented yet. We do have set a client connection timeout
but cannot find
an equivalent setting for the server.

We have  a suspicion that this happens when we run two client processes
which modify two distinct partitions of the same hive table. We put in a
workaround so that the two hive client processes never run together and so
far things look ok but we will keep monitoring.

Could it be because hive metastore server is not thread safe, would running
two alter table statements on two distinct partitions of the same table
using two client connections cause problems like these, where hive
metastore server closes or drops a wrong client connection and leaves the
other hanging?

Agateaaa




On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote:

> The mentioned flow is called when you have unsecure mode of thrift
> metastore client-server connection. So one way to avoid this is have a
> secure way.
>
> <code>
> public boolean process(final TProtocol in, final TProtocol out)
> throwsTException {
> setIpAddress(in);
> ...
> ...
> ...
> @Override
>      protected void setIpAddress(final TProtocol in) {
>     TUGIContainingTransport ugiTrans =
> (TUGIContainingTransport)in.getTransport();
>                     Socket socket = ugiTrans.getSocket();
>     if (socket != null) {
>       setIpAddress(socket);
>
> </code>
>
>
> From the above code snippet, it looks like the null pointer exception is
> not handled if the getSocket returns null.
>
> can you check whats the ulimit setting on the server? If its set to default
> can you set it to unlimited and restart hcat server. (This is just a wild
> guess).
>
> also the getSocket method suggests "If the underlying TTransport is an
> instance of TSocket, it returns the Socket object which it contains.
> Otherwise it returns null."
>
> so someone from thirft gurus need to tell us whats happening. I have no
> knowledge of this depth
>
> may be Ashutosh or Thejas will be able to help on this.
>
>
>
>
> From the netstat close_wait, it looks like the hive metastore server has
> not closed the connection (do not know why yet), may be the hive dev guys
> can help.Are there too many connections in close_wait state?
>
>
>
> On Tue, Jul 30, 2013 at 5:52 AM, agateaaa <agate...@gmail.com> wrote:
>
> > Looking at the hive metastore server logs see errors like these:
> >
> > 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
> > (TThreadPoolServer.java:run(182)) - Error occurred during processing of
> > message.
> > java.lang.NullPointerException
> >         at
> >
> >
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
> >         at
> >
> >
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
> >         at
> >
> >
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >  at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.java:662)
> >
> > approx same time as we see timeout or connection reset errors.
> >
> > Dont know if this is the cause or the side affect of he connection
> > timeout/connection reset errors. Does anybody have any pointers or
> > suggestions ?
> >
> > Thanks
> >
> >
> > On Mon, Jul 29, 2013 at 11:29 AM, agateaaa <agate...@gmail.com> wrote:
> >
> > > Thanks Nitin!
> > >
> > > We have simiar setup (identical hcatalog and hive server versions) on a
> > > another production environment and dont see any errors (its been
> running
> > ok
> > > for a few months)
> > >
> > > Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive
> > > 0.10 soon.
> > >
> > > I did see that the last time we ran into this problem doing a
> netstat-ntp
> > > | grep ":10000" see that server was holding on to one socket connection
> > in
> > > CLOSE_WAIT state for a long time
> > >  (hive metastore server is running on port 10000). Dont know if thats
> > > relevant here or not
> > >
> > > Can you suggest any hive configuration settings we can tweak or
> > networking
> > > tools/tips, we can use to narrow this down ?
> > >
> > > Thanks
> > > Agateaaa
> > >
> > >
> > >
> > >
> > > On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar <nitinpawar...@gmail.com
> > >wrote:
> > >
> > >> Is there any chance you can do a update on test environment with
> > hcat-0.5
> > >> and hive-0(11 or 10) and see if you can reproduce the issue?
> > >>
> > >> We used to see this error when there was load on hcat server or some
> > >> network issue connecting to the server(second one was rare occurrence)
> > >>
> > >>
> > >> On Mon, Jul 29, 2013 at 11:13 PM, agateaaa <agate...@gmail.com>
> wrote:
> > >>
> > >>> Hi All:
> > >>>
> > >>> We are running into frequent problem using HCatalog 0.4.1 (HIve
> > Metastore
> > >>> Server 0.9) where we get connection reset or connection timeout
> errors.
> > >>>
> > >>> The hive metastore server has been allocated enough (12G) memory.
> > >>>
> > >>> This is a critical problem for us and would appreciate if anyone has
> > any
> > >>> pointers.
> > >>>
> > >>> We did add a retry logic in our client, which seems to help, but I am
> > >>> just
> > >>> wondering how can we narrow down to the root cause
> > >>> of this problem. Could this be a hiccup in networking which causes
> the
> > >>> hive
> > >>> server to get into a unresponsive state  ?
> > >>>
> > >>> Thanks
> > >>>
> > >>> Agateaaa
> > >>>
> > >>>
> > >>> Example Connection reset error:
> > >>> =======================
> > >>>
> > >>> org.apache.thrift.transport.TTransportException:
> > >>> java.net.SocketException:
> > >>> Connection reset
> > >>> at
> > >>>
> > >>>
> >
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> > >>>  at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> > >>>  at
> > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:157)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
> > >>> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
> > >>>  at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
> > >>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
> > >>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> > >>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
> > >>> at
> > >>>
> > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> > >>>  at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
> > >>> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
> > >>>  at
> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
> > >>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
> > >>>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
> > >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>>  at
> > >>>
> > >>>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >>> at
> > >>>
> > >>>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >>>  at java.lang.reflect.Method.invoke(Method.java:597)
> > >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > >>> Caused by: java.net.SocketException: Connection reset
> > >>> at java.net.SocketInputStream.read(SocketInputStream.java:168)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> > >>>  ... 30 more
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Example Connection timeout error:
> > >>> ==========================
> > >>>
> > >>> org.apache.thrift.transport.TTransportException:
> > >>> java.net.SocketTimeoutException: Read timed out
> > >>> at
> > >>>
> > >>>
> >
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> > >>>  at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> > >>>  at
> > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:157)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
> > >>> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
> > >>>  at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
> > >>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524)
> > >>>  at
> > >>>
> > >>>
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
> > >>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
> > >>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> > >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
> > >>>  at
> > >>>
> > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> > >>> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
> > >>>  at
> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
> > >>> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
> > >>>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
> > >>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
> > >>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>> at
> > >>>
> > >>>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >>>  at
> > >>>
> > >>>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >>> at java.lang.reflect.Method.invoke(Method.java:597)
> > >>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > >>> Caused by: java.net.SocketTimeoutException: Read timed out
> > >>> at java.net.SocketInputStream.socketRead0(Native Method)
> > >>>  at java.net.SocketInputStream.read(SocketInputStream.java:129)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> > >>>  ... 31 more
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Nitin Pawar
> > >>
> > >
> > >
> >
>
>
>
> --
> Nitin Pawar
>

Reply via email to