The solution is to set 'spark.shuffle.io.preferDirectBufs' to 'false'. Then it is working.
Cheers! On Fri, Oct 9, 2015 at 3:13 PM, Ivan Héda <ivan.h...@gmail.com> wrote: > Hi, > > I'm facing an issue with PySpark (1.5.1, 1.6.0-SNAPSHOT) running over Yarn > (2.6.0-cdh5.4.4). Everything seems fine when working with dataframes, but > when i need RDD the workers start to fail. Like in the next code > > table1 = sqlContext.table('someTable') > table1.count() ## OK ## cca 500 millions rows > > table1.groupBy(table1.field).count().show() ## no problem > > table1.rdd.count() ## fails with above log from driver > > # Py4JJavaError: An error occurredwhile calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > > # : org.apache.spark.SparkException: Job aborted due to stage failure: Task > 23 in stage 117.0 failed 4 times, most recent failure: Lost task 23.3 in > stage 117.0 (TID 23836, some_host): ExecutorLostFailure (executor 2446 lost) > > The particular workers fail with this log > > 15/10/09 14:56:59 WARN TransportChannelHandler: Exception in connection from > host/ip:port > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > > > RDD is working as expected if I use > conf.set("spark.shuffle.blockTransferService", "nio"). > > Since "nio" is deprecated I'm looking for better solution. Any ideas? > > Thanks in advance > > ih > >