Re: Mesos task ordering guarantees

2014-09-14 Thread Vinod Kone
Yes. The order is guaranteed. @vinodkone > On Sep 14, 2014, at 5:28 AM, Tom Arnfeld wrote: > > Hey, > > I couldn't seem to find any documentation on this.. > > If a framework responds to an offer with two tasks and they share the same > executor (therefore leading to two invocations of laun

Re: Untaring Framework tgzs: Can we customize?

2014-09-13 Thread Vinod Kone
t as I am traveling) :) > On Sep 12, 2014 9:06 PM, "Vinod Kone" wrote: > >> Having a "skip chown" option sounds good to me. We'll add the option to >> CommandInfo.URI so that frameworks can override the default if desired. >> Mind filing a ticket? >

Re: Untaring Framework tgzs: Can we customize?

2014-09-12 Thread Vinod Kone
adoop > run the framework. In this model, the LinuxTaskController should work. > > Thanks for looking into this, I welcome more thoughts on the subject. > > John > > > > On Wed, Sep 10, 2014 at 4:39 PM, Vinod Kone wrote: > >> IanD: Mind helping John out

Re: Sandbox GC fails

2014-09-11 Thread Vinod Kone
nts of the directory and that is being > swallowed, so files still remain by the time the whole sandbox removal is > attempted, causing a "Directory is not empty". > > Appreciate any input! > > >> On 8 September 2014 07:26, Tom Arnfeld wrote: >> That'

Re: Untaring Framework tgzs: Can we customize?

2014-09-10 Thread Vinod Kone
IanD: Mind helping John out here? My hunch here is that this is because the slave does "chown()" after extracting ( https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L258)? >From POSIX standard, it looks like chown() when invoked by root doesn't clear the setuid bit for ordinar

Re: Mesos Driver aborted silently?

2014-09-10 Thread Vinod Kone
My guess is that your driver threw an exception while handling the offerRescinded() callback which was detected by the JNI binding (IIRC Mantis is a JVM framework?) causing it to abort the driver. Note that when a driver aborts, it will send a DeactivateFrameworkMessage to the master causing the ma

Mesos webcast

2014-09-09 Thread Vinod Kone
Hi folks, I'm doing a webcast on Mesos this thursday (h/t Mesosphere) where I will talk about some of the core features of Mesos (slave recovery, authentication and authorization). At the end, we will have time for Q&A for any and all questions related to Mesos. More details: https://attendee.got

Re: Mesos on Gentoo

2014-09-07 Thread Vinod Kone
Hi James, Great to see a Gentoo package for Mesos! Regarding HDFS requirement, any shared storage (even just a http/ftp server works) that the Mesos slaves can pull the executor from is enough.

Re: Sandbox GC fails

2014-09-07 Thread Vinod Kone
On Sat, Sep 6, 2014 at 8:23 AM, Tom Arnfeld wrote: > If I try and manually remove the directory mentioned, it works fine. Is > this a known issue, or should I do a little more debugging? I've not tried > to reproduce it under specific conditions yet. > > This is surprising. GC does a recursive di

Re: Introducing Portainer

2014-09-03 Thread Vinod Kone
This is great Tom. Thanks for sharing. We do list Mesos frameworks on the website (http://mesos.apache.org/documentation/latest/mesos-frameworks/). Please send a PR or RB request. On Wed, Sep 3, 2014 at 3:50 PM, Tom Arnfeld wrote: > @Ankur Wups! That's silly of me... http://github.com/duedil-lt

Re: MongoDB on mesos

2014-09-02 Thread Vinod Kone
Bill, just to clarify, that only works in Aurora if state is written outside the sandbox, correct?

Re: MongoDB on mesos

2014-09-02 Thread Vinod Kone
I'm not aware of any ports of MongoDB on Mesos, but the one gotcha to keep in mind when porting database frameworks is that the task/executor sandbox in Mesos is ephemeral. IOW, when an executor exits the sandbox gets cleaned up (not immediately but after certain time based on the garbage collectio

Re: Docker Example Mesos 0.20?

2014-08-27 Thread Vinod Kone
On Wed, Aug 27, 2014 at 9:14 AM, Connor Doyle wrote: > The order they are listed is significant Why is the order important? Is it a Marathon restriction? IIUC, Mesos will pick the right* containerizer based on whether TaskInfo.ContainerInfo or ExecutorInfo.ContainerInfo is set. * there is curr

Re: Migration from mesos 0.19 to mesos 0.20

2014-08-27 Thread Vinod Kone
See docs/upgrades.md. @vinodkone > On Aug 27, 2014, at 5:48 AM, Giulio Eulisse wrote: > > Hi, > > is there any best practices / recommendation when updating from mesos 0.19 to > mesos 0.20? > > -- > Ciao, > Giulio

Re: Issue with Multinode Cluster

2014-08-25 Thread Vinod Kone
robably has a file (/etc/defaults/mesos-master?) to set these flags. On Mon, Aug 25, 2014 at 3:26 PM, Frank Hinek wrote: > Logs attached from master, slave, and zookeeper after a reboot of both > nodes. > > > > > On August 25, 2014 at 1:14:07 PM, Vinod Kone (vinodk...@gmail.

Re: Storm on Mesos

2014-08-25 Thread Vinod Kone
On Mon, Aug 25, 2014 at 4:25 PM, Eran Chinthaka Withana < eran.chinth...@gmail.com> wrote: > What does "Invalid user: nonexistent" means? Any idea? > Looks like the unix user that the slave is trying to run the executor as doesn't exist. Do you know what user storm is trying to run the executor a

Re: Storm on Mesos

2014-08-25 Thread Vinod Kone
I don't know enough about storm, but on the mesos side you can run the master with more logging (by setting GLOG_v=1 in the environment). That will show you how many resources are being offered to the framework. FWICT, it looks like the storm framework is declining the offer(s) possible because the

Re: URI of Executor is not recognized in mesos-0.18.1

2014-08-25 Thread Vinod Kone
>> framework 20140820-102346-3281103040-5050-9694-0023 has exited with status >> 127 >> >> >> Thanks and Regards, >> Sai >> >> J. Sai Sagar >> Software Engineer, >> Innovation Labs >> Impetus - Bangalore >> >> https://www.linked

Re: Issue with Multinode Cluster

2014-08-25 Thread Vinod Kone
what do the master and slave logs say? On Mon, Aug 25, 2014 at 9:03 AM, Frank Hinek wrote: > I was able to get a single node environment setup on Ubuntu 14.04.1 > following this guide: http://mesosphere.io/learn/install_ubuntu_debian/ > > The single slave registered with the master via the loca

Re: cgroup per executor or task ?

2014-08-19 Thread Vinod Kone
On Tue, Aug 19, 2014 at 12:06 PM, mohit soni wrote: > If slave doesn't directly use task id or executor id, and instead use > the random UUID for cgroup, then my assumption is that it maintains a > mapping from this UUID to either task or executor id, internally. > that's correct.

Re: cgroup per executor or task ?

2014-08-19 Thread Vinod Kone
Are you sure "71d35ca2-caa5-420d-9161-a42b750555cd" is your task id? The id of the cgroup is a random UUID generated by mesos slave. It has nothing to do with executor/task ids IIRC. On Tue, Aug 19, 2014 at 11:22 AM, mohit soni wrote: > I would like to know if cgroups are applied at executor le

Re: [VOTE] Release Apache Mesos 0.20.0 (rc2)

2014-08-19 Thread Vinod Kone
+1 make check passes on OSX Mavericks and CentOS 5.5 On Mon, Aug 18, 2014 at 11:26 PM, Jie Yu wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 0.20.0. > > NOTE: 0.20.0-rc1 has a bug on Mac (MESOS-1713) which is fixed in > 0.20.0-rc2. > > > 0.20.0 includes

Re: error in make check

2014-08-19 Thread Vinod Kone
Is this repeatable? If yes, mind filing a ticket at https://issues.apache.org/jira/browse/MESOS? On Mon, Aug 18, 2014 at 11:47 PM, Giovanni Colapinto < gcolapi...@innovazionedigitale.it> wrote: > Hello. > > I've compiled mesos from source. All fine with make, but make check gives > me this error

Re: URI of Executor is not recognized in mesos-0.18.1

2014-08-19 Thread Vinod Kone
what is the error? On Mon, Aug 18, 2014 at 11:54 PM, Sai Sagar wrote: > Hi, > > I compiled my executor with the following command > > g++ executor.cpp -Lmesos-0.18.1/src/.libs/ -lmesos -I/usr/local/include > -Imesos-0.18.1/src/ > -Imesos-0.18.1/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/ >

Re: Struggling with task controller Permissions on Hadoop Mesos

2014-08-18 Thread Vinod Kone
On Sat, Aug 16, 2014 at 4:26 AM, John Omernik wrote: > I've confirmed on the package I am using that when I untar it using tar > zxf as root, that the task-controller does NOT lose the setuid bit. But on > the lost tasks in Mesos I get the error below. What's interesting is that > if "drill dow

Re: [VOTE] Release Apache Mesos 0.20.0 (rc1)

2014-08-18 Thread Vinod Kone
make check succeed on Centos 5.5 but failed on Python framework on OSX Mavericks. environment details: ➜ mesos-0.20.0 gcc --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 5.1 (clang-503.0.40) (b

Re: Mesos + storm on top of Docker

2014-08-18 Thread Vinod Kone
Can you paste the slave/executor log related to the executor failure? @vinodkone > On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum wrote: > > Hi > > I have created a Docker based Mesos setup, including chronos, marathon, and > storm. > Following advice I saw previously on this mailing list, I ha

Re: Slave disconnecting after I run the task

2014-08-15 Thread Vinod Kone
it is likely a networking issue. http://stackoverflow.com/questions/24559616/mesos-scheduler-slave-continuously-gets-disconnected On Thu, Aug 14, 2014 at 12:13 AM, Sai Sagar wrote: > Hi all, > > When I am running an example in "src/example", the slave is disconnecting > from the Master. From ma

Re: MesosCon attendee introduction thread

2014-08-14 Thread Vinod Kone
Heya. This is Vinod Kone. I work for Twitter. Been hacking on this Mesos thing for a while. Hope you all like what you see :) Looking forward to put faces to names at mesoscon. @vinodkone > On Aug 14, 2014, at 7:40 PM, Tim St Clair wrote: > > I'm Timothy St. Clair (@timothy

Re: Exposing executor container

2014-08-13 Thread Vinod Kone
On Tue, Aug 12, 2014 at 1:17 PM, Thomas Petr wrote: > That solution would likely cause us more pain -- we'd still need to figure > out an appropriate amount of resources to request for artifact downloads / > extractions, our scheduler would need to be sophisticated enough to only > accept offers

Fixing a buggy key/value pair in slave's /stats.json

2014-08-12 Thread Vinod Kone
Hi, As BenM pointed out in https://issues.apache.org/jira/browse/MESOS-1695 a tiny bug was introduced ~year ago in slave's /stats.json endpoint. Specifically, the "registered" key in the JSON has a value "1"/"0" instead of 1/0 (i.e., string instead of number). I plan to fix it soon (likely 0.21.0

Re: Exposing executor container

2014-08-12 Thread Vinod Kone
n alternative fetcher > executable (perhaps in CommandInfo?). > > Thanks, > Tom > > > On Tue, Aug 12, 2014 at 1:09 PM, Vinod Kone wrote: > >> Hi Whitney, >> >> While we could conceivably set the container id in the environment of the >> executor, I would like

Re: Exposing executor container

2014-08-12 Thread Vinod Kone
Hi Whitney, While we could conceivably set the container id in the environment of the executor, I would like to understand the problem you are facing. The fetching and extracting of the executor is done in by mesos-fetcher, a process forked by slave and run under slave's cgroup. AFAICT, this shou

Re: stale framework registrations

2014-08-05 Thread Vinod Kone
On Tue, Aug 5, 2014 at 4:58 PM, David Palaitis wrote: > It’s still registered after a few hours… > > > How did you "stop" marathon? Also, any log messages on the master pertaining to this event would be useful to diagnose. > I don’t see a shutdown in the list of endpoints for /master. What >

Re: stale framework registrations

2014-08-05 Thread Vinod Kone
On Tue, Aug 5, 2014 at 9:48 AM, David Palaitis wrote: > I recently stopped Marathon but it is still registered with the Mesos > Masters. I started a new instance of Marathon and it has re-registered > successfully with a new framework Id. > > > > I’d like to understand how to force deregistration

Disallowing completed frameworks from re-registering with the same framework id

2014-08-04 Thread Vinod Kone
Hi, Currently, there is a bug in Mesos, which allows a completed framework (e.g., removed by master due to being disconnected for longer than failover timeout) to re-register with the same framework id. This causes issues in the WebUI because the same framework id exists in "active" and "terminate

Re: spark and mesos issue

2014-07-15 Thread Vinod Kone
On Tue, Jul 15, 2014 at 11:02 PM, Vinod Kone wrote: > > On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh < > gurvinder.si...@uninett.no> wrote: > >> ERROR storage.BlockManagerMasterActor: Got two different block manager >> registrations on 201407031041-1227224054-505

Re: spark and mesos issue

2014-07-15 Thread Vinod Kone
On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh wrote: > ERROR storage.BlockManagerMasterActor: Got two different block manager > registrations on 201407031041-1227224054-5050-24004-0 > > Googling about it seems that mesos is starting slaves at the same time > and giving them the same id. So may

Re: [VOTE] Release Apache Mesos 0.19.1 (rc1)

2014-07-14 Thread Vinod Kone
+1 (binding) Tested on OSX Mavericks w/ gcc-4.8 On Mon, Jul 14, 2014 at 2:35 PM, Timothy Chen wrote: > +1 (non-binding). > > Tim > > On Mon, Jul 14, 2014 at 2:32 PM, Benjamin Mahler > wrote: > > Hi all, > > > > Please vote on releasing the following candidate as Apache Mesos 0.19.1. > > > > >

Re: Controlling Resources Allocated to a Given Task

2014-07-14 Thread Vinod Kone
How are you launching the slaves? By default the slave doesn't do any resource isolation. You should enable cgroups (only available on linux) for this to work. ./bin/mesos-slave.sh --isolation='cgroups/cpu,cgroups/mem' Note that 'cpu' isolation by default is a lower bound. To set it as an upper

Re: Framework capable of launching multiple tasks on same offer?

2014-07-14 Thread Vinod Kone
You can ignore that warning message. It was logged by mistake due to a regression. It's fixed on HEAD and will be included in 0.20.0. commit dd94a1fe9aff281f49d61bd8c214f41fcb340b04 Author: Vinod Kone Date: Thu May 29 15:32:03 2014 -0700 Fixed a bug in scheduler driver to pro

Re: Framework capable of launching multiple tasks on same offer?

2014-07-14 Thread Vinod Kone
Yes. You can definitely launch multiple tasks within the same offer (launchTasks() takes multiple TaskInfos) as long as the sum total of resources required by the tasks (and their executors) can fit in the offered resources. In fact, if you are hoarding offers (not recommended if you are running mu

Re: 0.19.1

2014-07-03 Thread Vinod Kone
correct url: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1 On Thu, Jul 3, 2014 at 1:40 PM, Vinod Kone wrote: > Hi, > > We are planning to release 0.19.1 (likely next week) which will be a bug > fix release.

Re: Running test-executor

2014-07-03 Thread Vinod Kone
0 ( > hotbox-32.Stanford.EDU) > I0703 13:48:04.988618 14236 hierarchical_allocator_process.hpp:636] > Recovered cpus(*):1; mem(*):32 (total allocatable: cpus(*):8; mem(*):15024; > disk(*):448079; ports(*):[31000-32000]) on slave > 20140703-110217-1174818570-5050-11997-0 from framewo

0.19.1

2014-07-03 Thread Vinod Kone
Hi, We are planning to release 0.19.1 (likely next week) which will be a bug fix release. Specifically, these are the fixes that we are planning to cherry pick. https://issues.apache.org/jira/issues/?filter=12326191&jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1 If t

Re: Running test-executor

2014-07-03 Thread Vinod Kone
Sammy, You need to run a framework to be able to run an executor. See http://mesos.apache.org/gettingstarted/ to see how to run the example python framework. On Thu, Jul 3, 2014 at 11:29 AM, Sammy Steele wrote: > I am trying to figure out how to run the python test-executor given in the > meso

Re: Hadoop on Mesos instantly terminates after registering

2014-07-03 Thread Vinod Kone
On Thu, Jul 3, 2014 at 7:00 AM, Andrew Jones wrote: > I0703 13:57:26.040679 51675 master.cpp:1059] Registering framework > 20140620-174222-1209730570-5050-51658-0666 at > scheduler(1)@127.0.1.1:53662 > The hadoop scheduler is registering with master but using a local ip address (127.0.0.1). Sett

Re: Task serialization per machine?

2014-07-01 Thread Vinod Kone
What Sharma said. Both the scheduler and executor drivers are single threaded i.e., you will only get one call back at a time. IOW, unless you return from one callback you won't get the next callback. On Tue, Jul 1, 2014 at 10:03 AM, Sharma Podila wrote: > Hi Asim, > > I am using (developing)

Re: cgroups OOM handler causing lockups?

2014-07-01 Thread Vinod Kone
Hey Whitney, I'll let Ian Downes comment on the specific patches you linked, but at a high level the bug in MESOS-662 was due to Mesos trying to handle OOM situations in user space instead of letting kernel handle it. We have since then changed the behavior to let Kernel handle the OOM. You can co

Re: Multiple Slaves on Mesos Cluster

2014-06-27 Thread Vinod Kone
It looks like the framework and slave are not able to properly register with the master due to networking issues. There should be log messages indicating whether master received registration requests are not. > "I0627 16:02:42.431401 10059 slave.cpp:2873] Current usage 0.81%. Max allowed age: 6.24

Re: Framework unregistered

2014-06-27 Thread Vinod Kone
Perhaps we should call this out explicitly when we back port and do bug fix releases (0.18.0 and 0.19.0) and urge people to upgrade lest this gets drowned out in the noise. On Fri, Jun 27, 2014 at 11:40 AM, Benjamin Hindman < benjamin.hind...@gmail.com> wrote: > Thanks for the bug report Whitney

Re: HDFS on Mesos

2014-06-25 Thread Vinod Kone
Thanks for listing this out Adam. Data Residency: > - Should we destroy the sandbox/hdfs-data when shutting down a DN? > - If starting DN on node that was previously running a DN, can/should we > try to revive the existing data? > I think this is one of the key challenges for a production quality

Re: cgroups memory isolation

2014-06-19 Thread Vinod Kone
On Thu, Jun 19, 2014 at 11:33 AM, Sharma Podila wrote: > Yeah, having soft-limit for memory seems like the right thing to do > immediately. The only problem left to solve being that it would be nicer to > throttle I/O instead of OOM for high rate I/O jobs. Hopefully the soft > limits on memory pu

Re: Framework Starvation

2014-06-19 Thread Vinod Kone
On Thu, Jun 19, 2014 at 10:46 AM, Vinod Kone wrote: > Waiting to see your blog post :) > > That said, what baffles me is that in the very beginning when only two > frameworks are present and no tasks have been launched, one framework is > getting more allocations than other (see

Re: Framework Starvation

2014-06-19 Thread Vinod Kone
running tasks with a share > and allocation > 0. > > Thanks, > Claudiu > > From: Vinod Kone > Reply-To: "user@mesos.apache.org" > Date: Wednesday, June 18, 2014 at 4:54 AM > > To: "user@mesos.apache.org" > Subject: Re: Framework Starvation

Re: "Failed to perform recovery: Incompatible slave info detected"

2014-06-19 Thread Vinod Kone
lining :) > > back to the metadata feature though - do you know why just the 'id' of > the slaves isn't used? > As it stands adding disk storage, cores or RAM to a slave will cause > it to drop out of cluster - > does checking the whole metadata provide any benefi

Re: "Failed to perform recovery: Incompatible slave info detected"

2014-06-18 Thread Vinod Kone
cgroups) - definitely wasn't the case until cfs was enabled. > > > On 18 June 2014 18:34, Vinod Kone wrote: > > Hey Dick, > > > > Regarding slave recovery, any changes in the SlaveInfo (see mesos.proto) > are > > considered as a new slave and hence recovery do

Re: "Failed to perform recovery: Incompatible slave info detected"

2014-06-18 Thread Vinod Kone
Hey Dick, Regarding slave recovery, any changes in the SlaveInfo (see mesos.proto) are considered as a new slave and hence recovery doesn't proceed forward. This is because Master caches SlaveInfo and it is quite complex to reconcile the differences in SlaveInfo. So we decided to fail on any Slave

Re: Framework Starvation

2014-06-17 Thread Vinod Kone
master log after adding more logging to the sorter code. > I believe the problem lies somewhere else however … > in HierarchicalAllocatorProcess::allocate() > > I will continue to investigate in the meantime. > > Thanks, > Claudiu > > From: Vinod Kone > Reply-To: "

Re: Framework Starvation

2014-06-13 Thread Vinod Kone
In case you didn't receive my email from @twitter domain. > On Thu, Jun 12, 2014 at 8:20 AM, Claudiu Barbura < > claudiu.barb...@atigeo.com> wrote: > >> We had to change the drf_sorter.cpp/hpp and >> hierarchical_allocator_process.cpp files. >> > > Hey Claudiu. Can you share the patch? > > > @vin

Re: Framework Starvation

2014-06-12 Thread Vinod Kone
On Thu, Jun 12, 2014 at 8:20 AM, Claudiu Barbura wrote: > We had to change the drf_sorter.cpp/hpp and > hierarchical_allocator_process.cpp files. > Hey Claudiu. Can you share the patch? @vinodkone

Re: Error while running Mesos slave on Mac OSX 10.9.3

2014-06-09 Thread Vinod Kone
ar Mesos master. Looks like IP address is being used as an > identifier for the slave. > > Thanks! > prakhar > > > On Mon, Jun 9, 2014 at 1:56 PM, Vinod Kone wrote: > >> Looks like "gethostbyname2" call is returning an error. I've seen this >>

Re: Error while running Mesos slave on Mac OSX 10.9.3

2014-06-09 Thread Vinod Kone
Looks like "gethostbyname2" call is returning an error. I've seen this before on my mac when i have vpn software running (or incorrectly stopped). im surprised though that master on the same box is fine but only slave has this issue? what happens if you specify --ip on the slave too? On Mon, Jun

Re: [VOTE] Release Apache Mesos 0.19.0 (rc3)

2014-06-09 Thread Vinod Kone
+1 make check on 0SX 10.9 On Sun, Jun 8, 2014 at 9:27 PM, Benjamin Hindman wrote: > +1, make check on OS X 10.7 with clang 3.3 (and it's also been running at > Twitter, gcc 4.8). > > > On Sat, Jun 7, 2014 at 10:58 AM, Tom Arnfeld wrote: > >> +1 from me. Tested on OSX Mavericks with gcc 4.8 an

Re: Dealing with "run away" task processes after executor terminates

2014-06-03 Thread Vinod Kone
+Jie,Ian Not sure if you've talked to Ian Downes and/or Jie Yu regarding this but they were discussing the same issue (offline) today. Just to be sure, if you are using cgroups, the mesos slave will cleanup the container (and all its processes) when an executor exits. Now there is definitely a ra

Re: Framework Starvation

2014-06-03 Thread Vinod Kone
Either should be fine. I don't think there are any changes in allocator since 0.18.0-rc1. On Tue, Jun 3, 2014 at 4:08 PM, Claudiu Barbura wrote: > Hi Vinod, > > Should we use the same 0-18.1-rc1 branch or trunk code? > > Thanks, > Claudiu > > From:

Re: Framework Starvation

2014-06-03 Thread Vinod Kone
nate 2 of the shark-cli instances, the starved ones are > receiving offers and are able to run queries again (see attached > log_after_starvation file). > > Let me know if you need the slave logs. > > Thank you! > Claudiu > > From: Vinod Kone > Reply-To: "user@m

Re: SLAVE LOST messages

2014-06-03 Thread Vinod Kone
The framework should receive a slave lost message though it is not reliably retired by the master incase it doesn't make it to the framework (master failover, framework failover etc). On Tue, Jun 3, 2014 at 3:09 PM, Diptanu Choudhury wrote: > Hi, > > When a mesos slave process get's killed or t

Re: Mesos master behind NAT

2014-05-30 Thread Vinod Kone
2014 at 9:11 AM, Tomas Barton > >wrote: >> > >> > > ok, I was using zookeer URL with zk01.example.com when I replaced it >> by >> > > an IP address it started to work. Thanks >> > > >> > > >> > > On 23 May 2014 17:58, V

Re: Mesos with non clustered environment.

2014-05-30 Thread Vinod Kone
Hey Raymond, Glad to hear that you are interested in Mesos. Please see my answers inline. It specifically is talking about resource requirements at the framework > level. > What if some tasks in the one framework require a GPU and others do not ? > The kind of resources that tasks from Beaker re

Re: Framework Starvation

2014-05-30 Thread Vinod Kone
Hey Claudiu, Mind posting some master logs with the simple setup that you described (3 shark cli instances)? That would help us better diagnose the problem. On Fri, May 30, 2014 at 1:59 AM, Claudiu Barbura wrote: > This is a critical issue for us as we have to shut down frameworks for > vario

Re: How to kill stuck frameworks in mesos

2014-05-28 Thread Vinod Kone
work state from zookeeper? > > Tomas > > > On 28 May 2014 05:56, Manivannan wrote: > >> Hi Vinod, >> >> Thanks for your reply. Please see inline. >> >> Thanks, >> Mani >> >> >> On Wed, May 28, 2014 at 3:57 AM, Vinod Kone wrot

Re: How to kill stuck frameworks in mesos

2014-05-28 Thread Vinod Kone
On Tue, May 27, 2014 at 8:56 PM, Manivannan wrote: > *What is the default fail over timeout ? * >> > The default failover timeout is 0s. You can confirm this by grepping master log for lines that look like "Giving framework to failover". I'm surprised that master doesn't move these frameworks t

Re: ExecutorDriver

2014-05-27 Thread Vinod Kone
On Fri, May 16, 2014 at 12:30 PM, Diptanu Choudhury wrote: > Is the ExecutorDriver that one gets in a launchTask callback in a Mesos > Executor singleton? I am currently caching the instance of the > ExecutorDriver when a launchTask is called in an Akka Actor which monitors > a linux container and

Re: How to kill stuck frameworks in mesos

2014-05-27 Thread Vinod Kone
Hi Mani, What do you mean by "stuck" framework? If the framework disconnects from master and the failover timeout (configurable) has passed master should remove the framework. Also, there is currently work in progress to give operators the ability to force remove a framework. See : https://issues

Re: Mesos master behind NAT

2014-05-23 Thread Vinod Kone
55fc process::schedule() > @ 0x7fdb5f394b50 start_thread > @ 0x7fdb5f0df0ed (unknown) > > I guess I have to use directly IP address, right? > > > On 23 May 2014 17:38, Vinod Kone wrote: > >> 0.18.0 https://issues.apache.org/jira/browse/MESOS-672 >

Re: Mesos master behind NAT

2014-05-23 Thread Vinod Kone
0.18.0 https://issues.apache.org/jira/browse/MESOS-672 On Fri, May 23, 2014 at 8:11 AM, Tomas Barton wrote: > Hey Vinod, > > thanks! That's exactly what I was looking for. I haven't noticed that > flag, since which version is it available? > > Tomas > > > On

Re: Mesos master behind NAT

2014-05-23 Thread Vinod Kone
You can use --hostname to tell master to publish a different address in zk. @vinodkone Sent from my mobile > On May 23, 2014, at 12:40 AM, Tomas Barton wrote: > > Hi, > > is it possible to run a Mesos master behind NAT? With the --ip flag I can set > IP address of an actual interface. When

Re: Q on master state.json

2014-05-21 Thread Vinod Kone
Master stores a cache of completed tasks. Currently the cache capacity (MAX_COMPLETED_TASKS_PER_FRAMEWORK) is set to 1000 though we could make it could make it configurable. On Wed, May 21, 2014 at 2:18 PM, Sharma Podila wrote: > I see that master/state.json has state information on frameworks,

Re: Mesos / Libprocess ENETUNREACH

2014-05-21 Thread Vinod Kone
-mesos-user@incubator (this mailing list is deprecated) Tom, Both the framework (and slaves) and master need to be able to talk to each other. IOW, if one of the end points uses a private IP (presumably thats the case with framework behind a VPN) then it wouldn't work. If you want the master/slav

Re: callback port

2014-05-19 Thread Vinod Kone
ar set. > > > On Mon, May 19, 2014 at 10:19 AM, Vinod Kone wrote: > >> Probably. How are you setting the LIBPROCESS_PORT in Marathon? It has to >> be set via CommandInfo.Environment() of the task/executor for this to take >> effect. >> >> >> On Fri, Ma

Re: callback port

2014-05-19 Thread Vinod Kone
Probably. How are you setting the LIBPROCESS_PORT in Marathon? It has to be set via CommandInfo.Environment() of the task/executor for this to take effect. On Fri, May 16, 2014 at 9:41 AM, Scott Clasen wrote: > Aha, thanks! I am still having an issue. I am executing the process via > marathon,

Re: [VOTE] Release Apache Mesos 0.18.2 (rc1)

2014-05-16 Thread Vinod Kone
+1 make check passed. Cent OS 6 w/ gccc 4.8 On Wed, May 14, 2014 at 8:33 PM, Iven Hsu wrote: > +1 > make check succeeded in Arch Linux + clang 3.4.1 > > > 2014-05-15 3:06 GMT+08:00 Niklas Nielsen : > > Hi all, >> >> Please vote on releasing the following candidate as Apache Mesos 0.18.2. >> >>

Re: How can I ask mesos cluster to reload configuration?

2014-05-13 Thread Vinod Kone
Hey Chengwei, Mesos doesn't allow online update of its configuration. The only exception, so far, has been the VLOG level. To update resources, you should roll the slave with new flags. On Sun, May 11, 2014 at 12:02 AM, Chengwei Yang wrote: > Hi List, > > Generally I have a question: does mes

Re: Where did 0.18.1 go? Suggesting 0.18.2

2014-05-13 Thread Vinod Kone
+1 On Tue, May 13, 2014 at 10:54 AM, Benjamin Hindman wrote: > +1! > > > On Tue, May 13, 2014 at 9:51 AM, Niklas Nielsen wrote: > >> Hey everyone, >> >> First and foremost, I apologize for the radio silence on my part with >> regards to the 0.18.1 release. We didn't announce it or make it publi

Re: protecting mesos from fat fingers

2014-05-06 Thread Vinod Kone
On Tue, May 6, 2014 at 2:01 PM, David Greenberg wrote: > We are actually working on solving #2, by adding mutual authentication > between masters and slaves, and ensure that each group knows in advance > what the valid masters/slaves are. This allows us to ensure that no > malicious masters/slaves

Re: protecting mesos from fat fingers

2014-05-02 Thread Vinod Kone
r any better (I doubt it very > much, my impression > was that's on the order of days). Think it's more the deploy could be > cancelled better while the > system was still functioning (speculation - i'm still in early stages > of learning the internals of this). > &g

Re: [VOTE] Release Apache Mesos 0.18.1 (rc2)

2014-05-01 Thread Vinod Kone
+1 make check passes on OSX 10.9 w/ gcc-4.8 On Wed, Apr 30, 2014 at 11:18 PM, Niklas Nielsen wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 0.18.1. > > > 0.18.1 includes the following: > > --

Re: protecting mesos from fat fingers

2014-04-30 Thread Vinod Kone
Dick, I've also briefly skimmed at your original email to marathon mailing list and it sounded like executor sandboxes were not getting garbage collected (a mesos feature) when the slave work directory was rooted in /tmp vs /var? Did I understand that right? If yes, I would love to see some logs.

Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-21 Thread Vinod Kone
On Mon, Apr 21, 2014 at 3:10 PM, Sharma Podila wrote: > On a related note, what if framework scheduler is up while Mesos master > goes down. Then, if Mesos master restarts after a time interval greater > than framework failover timeout, what is the expected behavior? Would the > framework success

Re: Establishing a process for featuring Mesos blog contributions

2014-04-18 Thread Vinod Kone
Thanks for seeding this discussion Dave. The points you make sound great to me. +1 for the outlined process. > > On Fri, Apr 18, 2014 at 2:33 PM, Dave Lester wrote: > >> Hi All, >> >> tl;dr: Following discussion with PMC members, I'd like to kick off this >> thread on the user list to discuss

Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-18 Thread Vinod Kone
> down for the whole failover grave period, the framework still wants to > register, since it's state never gets invalidated. > > Thanks, > David > > > On Thursday, April 17, 2014, Vinod Kone wrote: > >> >> On Thu, Apr 17, 2014 at 2:56 PM, David Greenberg

Re: Trying to get task reconciliation to work

2014-04-18 Thread Vinod Kone
If a framework asks to reconcile a task that doesn't belong to it there would be no response from the master. This is nice because it avoids information leak between frameworks. On Fri, Apr 18, 2014 at 5:04 AM, David Greenberg wrote: > Piggybacking onto this thread with a follow up question: wha

Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread Vinod Kone
On Thu, Apr 17, 2014 at 2:56 PM, David Greenberg wrote: > My follow-up question is this--is there a way to tell whether I'm outside > of the timeout window? I'd like to have my framework check ZK and determine > whether it's w/in the framework timeout or not, so that it can make the > correct call

Re: 0.18.1

2014-04-15 Thread Vinod Kone
Niklas has kindly agreed to be the release manager for 0.18.1. I will let him make the call, but this patch lgtm. On Tue, Apr 15, 2014 at 1:15 PM, Adam Bordelon wrote: > Perhaps commit db3b5ed86b7f5a8d10fd8fc3fd59eb6faec2fe20 for MESOS-979? > > > On Tue, Apr 15, 2014 at 11:19 AM

Re: Mesos slaves disconnecting because of Zookeeper?

2014-04-15 Thread Vinod Kone
mess 0.17.0 had a major refactor around interaction with ZooKeeper. So I would definitely recommend giving it a try and see if the problem persists. On Tue, Apr 15, 2014 at 11:59 AM, Ted Young wrote: > Anyone have any suggestions? I'm still seeing these problems and it's > causing our slaves t

Re: 0.18.1

2014-04-15 Thread Vinod Kone
On Mon, Apr 14, 2014 at 10:10 PM, Vinod Kone wrote: > Looks like I missed cherry-picking the fix for > https://issues.apache.org/jira/browse/MESOS-1045 into 0.18.0. > > So I would like to cut 0.18.1 with the cherry-pick. If there is any other > important fix that belongs to 0.1

0.18.1

2014-04-14 Thread Vinod Kone
Looks like I missed cherry-picking the fix for https://issues.apache.org/jira/browse/MESOS-1045 into 0.18.0. So I would like to cut 0.18.1 with the cherry-pick. If there is any other important fix that belongs to 0.18.* release but didn't make it into 0.18.0 please reply to this thread and I'll se

Re: Marathon does not register with mesos

2014-04-13 Thread Vinod Kone
Hey Mukesh, Mind pasting the master and marathon logs? That would help us diagnose. Vinod On Sun, Apr 13, 2014 at 11:56 AM, Mukesh G wrote: >Using marathon 0.4.1 and mesos 0.18 on Centos 6.4 platform, I am able > to successfully bring up mesos master, zookeeper and mesos slaves. The > meso

Re: marathon not connected to mesos master

2014-04-10 Thread Vinod Kone
What do master logs say? @vinodkone Sent from my mobile > On Apr 10, 2014, at 8:33 AM, "David J. Palaitis" > wrote: > > starting marathon ... > > ./bin/start --http_port 5150 --https_port 5151 --master > "zk://abc.xxx:2181/mesos,def.xxx:2181/mesos,ghi.xxx:2181/mesos,ghi.xxx:2181/mesos,hij.x

<    1   2   3   4   5   6   7   8   >