Hello all,

I'm running out of spacing when trying to restart nodes to get a cluster back up fully operational where a node ran out of space during an optimize.

It appears to be trying to do a full sync from another node, but doesn't take care to check available space before starting downloads and doesn't delete the out of date segment files before attempting to do the full sync.

If the segments are out of date and we are pulling from another node before coming "online" why aren't the old segments deleted? Is this something that can be enabled in the master solrconfig.xml file?

It seems to know what size the segments are before they are transferred, is there a reason a basic disk space check isn't done for the target partition with an immediate abort done if the destination's space looks like it would go negative before attempting sync? Is this something that can be enabled in the master solrconfig.xml file? This would be a lot more useful (IMHO) than waiting for a full sync to complete only to run out of space after several hundred gigs of data is transferred with automatic cluster recovery failing as a result.

This happens when doing a 'sudo service solr restart'

(Workaround, shutdown offending node, manually delete segment index folders and tlog files, start node)

Exception:

WARN - 2016-11-28 16:15:16.291; org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching file: _2f6i.cfs (downloaded 2317352960 of 5257809205 bytes)
java.io.IOException: No space left on device
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
    at java.nio.channels.Channels.writeFully(Channels.java:101)
    at java.nio.channels.Channels.access$000(Channels.java:61)
    at java.nio.channels.Channels$1.write(Channels.java:174)
at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:419)
    at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53) at org.apache.solr.handler.IndexFetcher$DirectoryFile.write(IndexFetcher.java:1634) at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1491) at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1429) at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:855) at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:434) at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:408) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

-Mike

Reply via email to