astralidea created ZEPPELIN-652: ----------------------------------- Summary: Zeppelin use spark-sql read hive have too many CLOSE_WAIT Key: ZEPPELIN-652 URL: https://issues.apache.org/jira/browse/ZEPPELIN-652 Project: Zeppelin Issue Type: Bug Components: zeppelin-interpreter Affects Versions: 0.6.0 Environment: spark-1.6.0 net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.default.accept_source_route = 0 kernel.core_uses_pid = 1 kernel.sysrq = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmall = 30455753932 kernel.shmmax = 124746768108748 kernel.core_uses_pid = 0 net.core.somaxconn = 10240 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_synack_retries = 3 net.ipv4.tcp_syn_retries = 3 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_no_metrics_save = 0 net.core.netdev_max_backlog = 2500 net.ipv4.tcp_max_syn_backlog = 4196 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 vm.swappiness = 0 ulimit -n 65536
Reporter: astralidea when use zeppelin read 300G data task by spark-sql hive cluster is about 300 machine. the first time when running the job CLOSE_WAIT increasing a lot. and next time will throw out an error like: java.net.SocketException: Too many open files at sun.nio.ch.Net.socket0(Native Method) at sun.nio.ch.Net.socket(Net.java:411) at sun.nio.ch.Net.socket(Net.java:404) at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:105) at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60) at java.nio.channels.SocketChannel.open(SocketChannel.java:145) at org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:523) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642) at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399) at org.apache.hadoop.ipc.Client.call(Client.java:1318) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:188) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1064) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1054) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1044) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:235) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:202) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:195) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1212) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:290) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:286) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:286) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:355) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:237) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1204) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1113) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) so how to find where should close the socket? -- This message was sent by Atlassian JIRA (v6.3.4#6332)