Hello all,
I'm running out of spacing when trying to restart nodes to get a cluster
back up fully operational where a node ran out of space during an optimize.
It appears to be trying to do a full sync from another node, but doesn't
take care to check available space before starting downloads and doesn't
delete the out of date segment files before attempting to do the full sync.
If the segments are out of date and we are pulling from another node
before coming "online" why aren't the old segments deleted? Is this
something that can be enabled in the master solrconfig.xml file?
It seems to know what size the segments are before they are transferred,
is there a reason a basic disk space check isn't done for the target
partition with an immediate abort done if the destination's space looks
like it would go negative before attempting sync? Is this something that
can be enabled in the master solrconfig.xml file? This would be a lot
more useful (IMHO) than waiting for a full sync to complete only to run
out of space after several hundred gigs of data is transferred with
automatic cluster recovery failing as a result.
This happens when doing a 'sudo service solr restart'
(Workaround, shutdown offending node, manually delete segment index
folders and tlog files, start node)
Exception:
WARN - 2016-11-28 16:15:16.291;
org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching
file: _2f6i.cfs (downloaded 2317352960 of 5257809205 bytes)
java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:101)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:419)
at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
at
org.apache.solr.handler.IndexFetcher$DirectoryFile.write(IndexFetcher.java:1634)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1491)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1429)
at
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:855)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:434)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:408)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
-Mike