This is great information. Thanks for sharing Steven! On Tue, Jun 27, 2017 at 7:05 AM, Steven Schlansker < sschlans...@opentable.com> wrote:
> > > On Jun 25, 2017, at 11:24 PM, Benjamin Mahler <bmah...@apache.org> > wrote: > > > > As a data point, as far as I'm aware, most users are using a local work > directory, not an NFS mounted one. Would love to hear from anyone on the > list if they are doing this, and if there are any subtleties that should be > documented. > > We don't run NFS in particular but we did originally use a SAN -- two > observations: > > NFS (historically, maybe it's better now, but doubtful...) has really bad > failure modes. > Network failures can cause serious hangs both in user-space and > kernel-space. Such > hangs can be impossible to clear without rebooting the machine, and in > some edge cases > can even make it difficult or impossible to reboot the machine via normal > means. > > Network attached drives (our SAN) are less reliable, slower, and more > complex > (read: more failure modes) than local disk. It's also a really big single > point > of failure. So far our only true cluster outages have been due to failure > of > the SAN, since it took down all nodes at once -- once we removed the SAN, > future > failures had islands of availability and any properly written application > could continue running (obviously without network resources) through the > incident. > > Maybe this isn't a huge deal for your use case, which might differ from > ours. > For us, it was enough of a problem that we now purchase local SSD scratch > space > for every node just so that we have some storage we can depend on a bit > more > than network attached storage. > > > > > On Thu, Jun 22, 2017 at 11:13 PM, <thomas.kurm...@artorg.unibe.ch> > wrote: > > Hi, > > > > We have a couple of server nodes mainly used for computational tasks in > > our mesos cluster. These servers have beefy cpus, gpus etc. but only > > limited ssd space. We also have a 40GBe network and a decently fast > > file server. > > > > My question is simple but I didnt find an answer anywhere: What are the > > best practices for the working directory on mesos-agent nodes? Should > > we keep the working directory local or is it reasonable to use a nfs > > mounted folder? We implemented both and they seem to work fine, but I > > would rather like to follow "best practices". > > > > Thanks and cheers > > > > Tom > > > >