Hi, I'm just calling the standard SVMWithSGD implementation of Spark's MLLib. I'm not using any method like "collect".
Thanks, Sarath On Tue, Apr 28, 2015 at 4:35 PM, ai he <heai0...@gmail.com> wrote: > Hi Sarath, > > It might be questionable to set num-executors as 64 if you only has 8 > nodes. Do you use any action like "collect" which will overwhelm the > driver since you have a large dataset? > > Thanks > > On Tue, Apr 28, 2015 at 10:50 AM, sarath <sarathkrishn...@gmail.com> > wrote: > > > > I am trying to train a large dataset consisting of 8 million data points > and > > 20 million features using SVMWithSGD. But it is failing after running for > > some time. I tried increasing num-partitions, driver-memory, > > executor-memory, driver-max-resultSize. Also I tried by reducing the > size of > > dataset from 8 million to 25K (keeping number of features same 20 M). But > > after using the entire 64GB driver memory for 20 to 30 min it failed. > > > > I'm using a cluster of 8 nodes (each with 8 cores and 64G RAM). > > executor-memory - 60G > > driver-memory - 60G > > num-executors - 64 > > And other default settings > > > > This is the error log : > > > > 15/04/20 11:51:09 WARN NativeCodeLoader: Unable to load native-hadoop > > library for your platform... using builtin-java classes where applicable > > 15/04/20 11:51:29 WARN BLAS: Failed to load implementation from: > > com.github.fommil.netlib.NativeSystemBLAS > > 15/04/20 11:51:29 WARN BLAS: Failed to load implementation from: > > com.github.fommil.netlib.NativeRefBLAS > > 15/04/20 11:56:11 WARN TransportChannelHandler: Exception in connection > from > > xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > > java.io.IOException: Connection reset by peer > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > > ....... > > 15/04/20 11:56:11 ERROR TransportResponseHandler: Still have 7 requests > > outstanding when connection from xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > is > > closed > > 15/04/20 11:56:11 ERROR OneForOneBlockFetcher: Failed while starting > block > > fetches > > java.io.IOException: Connection reset by peer > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > > ....... > > 15/04/20 11:56:11 ERROR OneForOneBlockFetcher: Failed while starting > block > > fetches > > java.io.IOException: Connection reset by peer > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > > ........... > > 15/04/20 11:56:12 ERROR RetryingBlockFetcher: Exception while beginning > > fetch of 1 outstanding blocks > > java.io.IOException: Failed to connect to > > xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > > at > > > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) > > at > > > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) > > at > > > org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) > > at > > > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > > at > > > org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) > > at > > > org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:87) > > at > > > org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:149) > > at > > > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:290) > > at > > > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53) > > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > > at > > > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > > at > > > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > > at > org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91) > > at > > > org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44) > > at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > > at org.apache.spark.scheduler.Task.run(Task.scala:64) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.net.ConnectException: Connection refused: > > xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > > at > > > io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208) > > at > > > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > > at > > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > > ... 1 more > > 15/04/20 11:56:15 ERROR RetryingBlockFetcher: Exception while beginning > > fetch of 1 outstanding blocks > > java.io.IOException: Failed to connect to > > xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > > at > > > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) > > > > Caused by: java.net.ConnectException: Connection refused: > > xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > > at > > > io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208) > > at > > > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > > at > > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > > ... 1 more > > 15/04/20 11:56:27 ERROR ShuffleBlockFetcherIterator: Failed to get > block(s) > > from xxx.xxx.xxx.net:41029 > > java.io.IOException: Failed to connect to > > xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > > at > > > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) > > at > > > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) > > at > > > org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) > > at > > > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > > at > > > org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) > > at > > > org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.net.ConnectException: Connection refused: > > xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > > at > > > io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208) > > at > > > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > > at > > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > > ... 1 more > > 15/04/20 11:56:30 ERROR ShuffleBlockFetcherIterator: Failed to get > block(s) > > from xxx.xxx.xxx.net:41029 > > java.io.IOException: Failed to connect to > > xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > > at > > > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) > > at > > > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) > > at > > > org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) > > at > > > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > > at > > > org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) > > at > > > org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.net.ConnectException: Connection refused: > > xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > > at > > > io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208) > > at > > > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > > at > > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > > at > > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > > ... 1 more > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-SVMWithSGD-is-failing-for-large-dataset-tp22694.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > > -- > Best > Ai > -- Sarath Krishna S