Re: Is mesos spamming me?

2015-02-01 Thread Vinod Kone
On Sun, Feb 1, 2015 at 8:58 PM, Vinod Kone vinodk...@gmail.com wrote: By default mesos slave leaves some RAM and CPU for system processes. You can override this behavior by --resources flag. On Sun, Feb 1, 2015 at 6:05 PM, Hepple, Robert rhep...@tnsi.com wrote: On Fri, 2015-01-30 at 10:00

Re: Mesos 0.22.0

2015-01-20 Thread Vinod Kone
+1 @vinodkone On Jan 20, 2015, at 12:03 PM, Chris Aniszczyk z...@twitter.com wrote: definite +1, lets keep the release rhythm going! maybe some space on the wiki for release planning / release managers would be a step forward On Tue, Jan 20, 2015 at 1:59 PM, Joe Stein

Re: Mesos Community Meetings

2015-01-20 Thread Vinod Kone
On Tue, Jan 20, 2015 at 4:10 PM, Vinod Kone vinodk...@gmail.com wrote: Thanks for the interest. The next meeting will be on 5th February, 3-5 pm PST. The hangout link: https://plus.google.com/hangouts/_/twitter.com/mesos-sync On Tue, Jan 6, 2015 at 9:13 AM, Tim St Clair tstcl...@redhat.com

Re: Architecture question

2015-01-09 Thread Vinod Kone
Have you looked at Aurora or Marathon? They have some (most?) of the features you are looking for. On Fri, Jan 9, 2015 at 10:59 AM, Srinivas Murthy srinimur...@gmail.com wrote: We have a legacy system with home brewn workflows defined in XPDL, running across multiple dozens of nodes. Resources

Re: master shutting down slave, how to restart

2015-01-08 Thread Vinod Kone
Did you figure out the issue here? I expected to see a New master detected at.. line in slave log. On Wed, Dec 10, 2014 at 7:16 AM, James Miller jamesmille...@gmail.com wrote: I'm having an issue restarting a downed slave. I see this message in the Master Logs about shutting down the

Re: Task Checkpointing with Mesos, Marathon and Docker containers

2014-11-25 Thread Vinod Kone
The mesos considers a slave (and its tasks) lost if the slave is down for 75s. @vinodkone On Nov 25, 2014, at 7:43 AM, Geoffroy Jabouley geoffroy.jabou...@gmail.com wrote: Hello i am currently trying to activate checkpointing for my Mesos cloud. Starting from an application running

Re: [VOTE] Release Apache Mesos 0.21.0 (rc3)

2014-11-14 Thread Vinod Kone
+1 Deployed to our test clusters. On Fri, Nov 14, 2014 at 2:29 PM, Vinod Kone vinodk...@gmail.com wrote: +1 Deployed to our test clusters. On Thu, Nov 13, 2014 at 2:56 PM, Ian Downes idow...@twitter.com.invalid wrote: Cosmin, No, to my knowledge rpms are not available for release

Re: [VOTE] Release Apache Mesos 0.21.0 (rc2)

2014-11-11 Thread Vinod Kone
+1 Successfully deployed to our test clusters. On Mon, Nov 10, 2014 at 2:23 PM, Ian Downes idow...@twitter.com.invalid wrote: Hi all, Please vote on releasing the following candidate as Apache Mesos 0.21.0. 0.21.0-rc2 added the following commits to 0.21.0-rc1: $ git cherry -v 0.21.0-rc1

Re: [VOTE] Release Apache Mesos 0.21.0 (rc2)

2014-11-11 Thread Vinod Kone
wrote: +1 Mac OS X 10.9.5 and Ubuntu trusty Niklas On 11 November 2014 10:11, Vinod Kone vinodk...@gmail.com wrote: +1 Successfully deployed to our test clusters. On Mon, Nov 10, 2014 at 2:23 PM, Ian Downes idow...@twitter.com.invalid wrote: Hi all, Please vote on releasing

Re: Struts wiki docs in Mesos wiki

2014-11-11 Thread Vinod Kone
Thanks for the report and offer to help! Granted you perms. Let me know if you are still having issues. On Mon, Nov 10, 2014 at 3:27 AM, Dharmesh Kakadia dhkaka...@gmail.com wrote: It looks like I don't have enough permissions to remove them. Can someone grant me? Also, is there a way we can

Re: [ANN] Mesos resources searchable

2014-11-10 Thread Vinod Kone
Cool. Just curious, why you decided to add mesos under hadoop. It has nothing to do with hadoop :) On Mon, Nov 10, 2014 at 3:45 PM, Ankur Chauhan an...@malloc64.com wrote: WOW! That is amazing. All these things in a single place is awesome. -- Ankur On 10 Nov 2014, at 15:41, Otis

Re: [VOTE] Release Apache Mesos 0.21.0 (rc1)

2014-11-06 Thread Vinod Kone
-1 There is a SEGFAULT issue in Authenticator. https://issues.apache.org/jira/browse/MESOS-2050 On Thu, Nov 6, 2014 at 3:59 PM, Niklas Nielsen nik...@mesosphere.io wrote: +1 for Ubuntu 14.04.1 LTS and Mac OS X 10.9.5. Niklas On 6 November 2014 13:00, Tom Arnfeld t...@duedil.com wrote:

Re: Multiple schedulers on one machine?

2014-10-28 Thread Vinod Kone
Sorry for the delay in response. It looks like master received the registration request from the scheduler at 20:13, at which point it sent a registration ack back to the scheduler. What is not clear is why no registration requests were sent/received from 19:47 when the framework apparently

New mailing list for Jenkins Mesos plugin

2014-10-20 Thread Vinod Kone
Hi folks, I created a new google group / mailing list ( https://groups.google.com/forum/#!forum/jenkins-mesos) for Jenkins Mesos plugin http://jenkinsci.github.io/mesos-plugin/. Please join this list for questions/discussions regarding the plugin itself. Thanks, Vinod

Re: CGroup Per-Task Isolation

2014-10-12 Thread Vinod Kone
No. It wasn't. On Sun, Oct 12, 2014 at 10:07 PM, Sammy Steele sammy_ste...@stanford.edu wrote: I found this post from last year discussing implementing per task cgroup isolation: https://issues.apache.org/jira/browse/MESOS-539. Has this ever been implemented? Thanks!

Re: Multiple schedulers on one machine?

2014-10-05 Thread Vinod Kone
On Sat, Oct 4, 2014 at 1:19 PM, Colleen Lee c...@coursera.org wrote: This normally occurs as expected, but after some time running multiple jobs, there will be approximately a 30-minute delay between the call to driver.run() and the registered() method being called (based on logs). This

Re: Mesos Slave gets registered with lower memory than available

2014-10-01 Thread Vinod Kone
Stefan, mind filing a ticket http://VirtualChannel about the documentation gap? On Wed, Oct 1, 2014 at 9:27 AM, Tim Chen t...@mesosphere.io wrote: Hi Stefan, Yes it's a feature where we leave some space on each slave and not fully allocate all the memory and cpu. You can override how much

Re: Mesos 0.20.1 still using -net=host when launching Docker containers

2014-10-01 Thread Vinod Kone
On Wed, Oct 1, 2014 at 12:54 PM, Andy Grove andy.gr...@codefutures.com wrote: I'd recommend adding this to the documentation as most people will probably expect this to be the default behavior. I'd also be happy to contribute this example code to be shipped with Mesos if there's any interest

Re: Docker executor issue

2014-09-30 Thread Vinod Kone
do you run the mesos slave in a docker container as well? Will be great if you can share the slave log as Vinod suggested too. Tim On Mon, Sep 29, 2014 at 5:15 PM, Vinod Kone vinodk...@gmail.com wrote: I'll let Tim Chen help you out here since he has more context. Some slave logs

Re: Mesos master failure question

2014-09-29 Thread Vinod Kone
Executors/tasks will keep running when the master is down. Any status updates sent by executors will be cached by the slaves and retried upon a new master being elected as leader. -- Vinod On Mon, Sep 29, 2014 at 10:11 AM, Andy Grove andy.gr...@codefutures.com wrote: Hi, I'd like to clarify

Mesos OPW project

2014-09-29 Thread Vinod Kone
Hi prospective OPW interns, I've been contacted by quite a few of you regarding Mesos Getting Started Documentation OPW project. So, wanted to send out a list of starter tickets that you can work on. https://issues.apache.org/jira/browse/MESOS-1647

Re: Docker executor issue

2014-09-29 Thread Vinod Kone
Trying increasing the executor registration timeout on the slave (--executor_registration_timeout) to give docker more time to do a pull of the image. On Mon, Sep 29, 2014 at 4:41 PM, Andy Grove andy.gr...@codefutures.com wrote: Hi, I've working on a prototype Mesos framework to launch docker

Re: Docker executor issue

2014-09-29 Thread Vinod Kone
some option so that the docker executor passed the -d flag to the docker run command? I guess I should start looking through the mesos source so I can see how this works. Thanks, Andy. -- Andy Grove VP Engineering CodeFutures Corporation On Mon, Sep 29, 2014 at 5:49 PM, Vinod Kone

Re: Mesos task ordering guarantees

2014-09-14 Thread Vinod Kone
Yes. The order is guaranteed. @vinodkone On Sep 14, 2014, at 5:28 AM, Tom Arnfeld t...@duedil.com wrote: Hey, I couldn't seem to find any documentation on this.. If a framework responds to an offer with two tasks and they share the same executor (therefore leading to two invocations

Re: Untaring Framework tgzs: Can we customize?

2014-09-12 Thread Vinod Kone
On Wed, Sep 10, 2014 at 4:39 PM, Vinod Kone vinodk...@gmail.com wrote: IanD: Mind helping John out here? My hunch here is that this is because the slave does chown() after extracting ( https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L258 )? From POSIX standard

Re: Sandbox GC fails

2014-09-11 Thread Vinod Kone
is attempted, causing a Directory is not empty. Appreciate any input! On 8 September 2014 07:26, Tom Arnfeld t...@duedil.com wrote: That's useful to know, thanks Vinod. I'll try and dig deeper. On Mon, Sep 8, 2014 at 5:33 AM, Vinod Kone vinodk...@gmail.com wrote: On Sat, Sep 6, 2014

Re: Mesos Driver aborted silently?

2014-09-10 Thread Vinod Kone
My guess is that your driver threw an exception while handling the offerRescinded() callback which was detected by the JNI binding (IIRC Mantis is a JVM framework?) causing it to abort the driver. Note that when a driver aborts, it will send a DeactivateFrameworkMessage to the master causing the

Re: Untaring Framework tgzs: Can we customize?

2014-09-10 Thread Vinod Kone
IanD: Mind helping John out here? My hunch here is that this is because the slave does chown() after extracting ( https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L258)? From POSIX standard, it looks like chown() when invoked by root doesn't clear the setuid bit for ordinary

Re: Sandbox GC fails

2014-09-07 Thread Vinod Kone
On Sat, Sep 6, 2014 at 8:23 AM, Tom Arnfeld t...@duedil.com wrote: If I try and manually remove the directory mentioned, it works fine. Is this a known issue, or should I do a little more debugging? I've not tried to reproduce it under specific conditions yet. This is surprising. GC does a

Re: Mesos on Gentoo

2014-09-07 Thread Vinod Kone
Hi James, Great to see a Gentoo package for Mesos! Regarding HDFS requirement, any shared storage (even just a http/ftp server works) that the Mesos slaves can pull the executor from is enough.

Re: MongoDB on mesos

2014-09-02 Thread Vinod Kone
Bill, just to clarify, that only works in Aurora if state is written outside the sandbox, correct?

Re: Migration from mesos 0.19 to mesos 0.20

2014-08-27 Thread Vinod Kone
See docs/upgrades.md. @vinodkone On Aug 27, 2014, at 5:48 AM, Giulio Eulisse giulio.euli...@cern.ch wrote: Hi, is there any best practices / recommendation when updating from mesos 0.19 to mesos 0.20? -- Ciao, Giulio

Re: Docker Example Mesos 0.20?

2014-08-27 Thread Vinod Kone
On Wed, Aug 27, 2014 at 9:14 AM, Connor Doyle connor@gmail.com wrote: The order they are listed is significant Why is the order important? Is it a Marathon restriction? IIUC, Mesos will pick the right* containerizer based on whether TaskInfo.ContainerInfo or ExecutorInfo.ContainerInfo is

Re: Issue with Multinode Cluster

2014-08-25 Thread Vinod Kone
what do the master and slave logs say? On Mon, Aug 25, 2014 at 9:03 AM, Frank Hinek frank.hi...@gmail.com wrote: I was able to get a single node environment setup on Ubuntu 14.04.1 following this guide: http://mesosphere.io/learn/install_ubuntu_debian/ The single slave registered with the

Re: Storm on Mesos

2014-08-25 Thread Vinod Kone
On Mon, Aug 25, 2014 at 4:25 PM, Eran Chinthaka Withana eran.chinth...@gmail.com wrote: What does Invalid user: nonexistent means? Any idea? Looks like the unix user that the slave is trying to run the executor as doesn't exist. Do you know what user storm is trying to run the executor as? If

Re: Issue with Multinode Cluster

2014-08-25 Thread Vinod Kone
a file (/etc/defaults/mesos-master?) to set these flags. On Mon, Aug 25, 2014 at 3:26 PM, Frank Hinek frank.hi...@gmail.com wrote: Logs attached from master, slave, and zookeeper after a reboot of both nodes. On August 25, 2014 at 1:14:07 PM, Vinod Kone (vinodk...@gmail.com) wrote: what do

Re: URI of Executor is not recognized in mesos-0.18.1

2014-08-19 Thread Vinod Kone
what is the error? On Mon, Aug 18, 2014 at 11:54 PM, Sai Sagar jsaisa...@gmail.com wrote: Hi, I compiled my executor with the following command g++ executor.cpp -Lmesos-0.18.1/src/.libs/ -lmesos -I/usr/local/include -Imesos-0.18.1/src/

Re: error in make check

2014-08-19 Thread Vinod Kone
Is this repeatable? If yes, mind filing a ticket at https://issues.apache.org/jira/browse/MESOS? On Mon, Aug 18, 2014 at 11:47 PM, Giovanni Colapinto gcolapi...@innovazionedigitale.it wrote: Hello. I've compiled mesos from source. All fine with make, but make check gives me this error:

Re: cgroup per executor or task ?

2014-08-19 Thread Vinod Kone
On Tue, Aug 19, 2014 at 12:06 PM, mohit soni mohitsoni1...@gmail.com wrote: If slave doesn't directly use task id or executor id, and instead use the random UUID for cgroup, then my assumption is that it maintains a mapping from this UUID to either task or executor id, internally. that's

Re: Mesos + storm on top of Docker

2014-08-18 Thread Vinod Kone
Can you paste the slave/executor log related to the executor failure? @vinodkone On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com wrote: Hi I have created a Docker based Mesos setup, including chronos, marathon, and storm. Following advice I saw previously on this

Re: Struggling with task controller Permissions on Hadoop Mesos

2014-08-18 Thread Vinod Kone
On Sat, Aug 16, 2014 at 4:26 AM, John Omernik j...@omernik.com wrote: I've confirmed on the package I am using that when I untar it using tar zxf as root, that the task-controller does NOT lose the setuid bit. But on the lost tasks in Mesos I get the error below. What's interesting is that

Re: Slave disconnecting after I run the task

2014-08-15 Thread Vinod Kone
it is likely a networking issue. http://stackoverflow.com/questions/24559616/mesos-scheduler-slave-continuously-gets-disconnected On Thu, Aug 14, 2014 at 12:13 AM, Sai Sagar jsaisa...@gmail.com wrote: Hi all, When I am running an example in src/example, the slave is disconnecting from the

Re: Exposing executor container

2014-08-13 Thread Vinod Kone
On Tue, Aug 12, 2014 at 1:17 PM, Thomas Petr tp...@hubspot.com wrote: That solution would likely cause us more pain -- we'd still need to figure out an appropriate amount of resources to request for artifact downloads / extractions, our scheduler would need to be sophisticated enough to only

Re: Exposing executor container

2014-08-12 Thread Vinod Kone
Hi Whitney, While we could conceivably set the container id in the environment of the executor, I would like to understand the problem you are facing. The fetching and extracting of the executor is done in by mesos-fetcher, a process forked by slave and run under slave's cgroup. AFAICT, this

Re: Exposing executor container

2014-08-12 Thread Vinod Kone
?). Thanks, Tom On Tue, Aug 12, 2014 at 1:09 PM, Vinod Kone vinodk...@gmail.com wrote: Hi Whitney, While we could conceivably set the container id in the environment of the executor, I would like to understand the problem you are facing. The fetching and extracting of the executor is done

Re: stale framework registrations

2014-08-05 Thread Vinod Kone
On Tue, Aug 5, 2014 at 4:58 PM, David Palaitis david.palai...@twosigma.com wrote: It’s still registered after a few hours… How did you stop marathon? Also, any log messages on the master pertaining to this event would be useful to diagnose. I don’t see a shutdown in the list of endpoints

Disallowing completed frameworks from re-registering with the same framework id

2014-08-04 Thread Vinod Kone
Hi, Currently, there is a bug in Mesos, which allows a completed framework (e.g., removed by master due to being disconnected for longer than failover timeout) to re-register with the same framework id. This causes issues in the WebUI because the same framework id exists in active and terminated

Re: spark and mesos issue

2014-07-16 Thread Vinod Kone
On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh gurvinder.si...@uninett.no wrote: ERROR storage.BlockManagerMasterActor: Got two different block manager registrations on 201407031041-1227224054-5050-24004-0 Googling about it seems that mesos is starting slaves at the same time and giving

Re: spark and mesos issue

2014-07-16 Thread Vinod Kone
On Tue, Jul 15, 2014 at 11:02 PM, Vinod Kone vi...@twitter.com wrote: On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh gurvinder.si...@uninett.no wrote: ERROR storage.BlockManagerMasterActor: Got two different block manager registrations on 201407031041-1227224054-5050-24004-0 Googling

Re: Framework capable of launching multiple tasks on same offer?

2014-07-14 Thread Vinod Kone
Yes. You can definitely launch multiple tasks within the same offer (launchTasks() takes multiple TaskInfos) as long as the sum total of resources required by the tasks (and their executors) can fit in the offered resources. In fact, if you are hoarding offers (not recommended if you are running

Re: Framework capable of launching multiple tasks on same offer?

2014-07-14 Thread Vinod Kone
You can ignore that warning message. It was logged by mistake due to a regression. It's fixed on HEAD and will be included in 0.20.0. commit dd94a1fe9aff281f49d61bd8c214f41fcb340b04 Author: Vinod Kone vi...@twitter.com Date: Thu May 29 15:32:03 2014 -0700 Fixed a bug in scheduler driver

Re: Controlling Resources Allocated to a Given Task

2014-07-14 Thread Vinod Kone
How are you launching the slaves? By default the slave doesn't do any resource isolation. You should enable cgroups (only available on linux) for this to work. ./bin/mesos-slave.sh --isolation='cgroups/cpu,cgroups/mem' Note that 'cpu' isolation by default is a lower bound. To set it as an upper

Re: [VOTE] Release Apache Mesos 0.19.1 (rc1)

2014-07-14 Thread Vinod Kone
+1 (binding) Tested on OSX Mavericks w/ gcc-4.8 On Mon, Jul 14, 2014 at 2:35 PM, Timothy Chen tnac...@gmail.com wrote: +1 (non-binding). Tim On Mon, Jul 14, 2014 at 2:32 PM, Benjamin Mahler benjamin.mah...@gmail.com wrote: Hi all, Please vote on releasing the following candidate as

Re: Running test-executor

2014-07-03 Thread Vinod Kone
Sammy, You need to run a framework to be able to run an executor. See http://mesos.apache.org/gettingstarted/ to see how to run the example python framework. On Thu, Jul 3, 2014 at 11:29 AM, Sammy Steele sammy_ste...@stanford.edu wrote: I am trying to figure out how to run the python

Re: 0.19.1

2014-07-03 Thread Vinod Kone
correct url: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1 On Thu, Jul 3, 2014 at 1:40 PM, Vinod Kone vinodk...@gmail.com wrote: Hi, We are planning to release 0.19.1 (likely next week) which will be a bug fix release

Re: cgroups OOM handler causing lockups?

2014-07-01 Thread Vinod Kone
Hey Whitney, I'll let Ian Downes comment on the specific patches you linked, but at a high level the bug in MESOS-662 was due to Mesos trying to handle OOM situations in user space instead of letting kernel handle it. We have since then changed the behavior to let Kernel handle the OOM. You can

Re: Task serialization per machine?

2014-07-01 Thread Vinod Kone
What Sharma said. Both the scheduler and executor drivers are single threaded i.e., you will only get one call back at a time. IOW, unless you return from one callback you won't get the next callback. On Tue, Jul 1, 2014 at 10:03 AM, Sharma Podila spod...@netflix.com wrote: Hi Asim, I am

Re: Framework unregistered

2014-06-27 Thread Vinod Kone
Perhaps we should call this out explicitly when we back port and do bug fix releases (0.18.0 and 0.19.0) and urge people to upgrade lest this gets drowned out in the noise. On Fri, Jun 27, 2014 at 11:40 AM, Benjamin Hindman benjamin.hind...@gmail.com wrote: Thanks for the bug report Whitney,

Re: HDFS on Mesos

2014-06-25 Thread Vinod Kone
Thanks for listing this out Adam. Data Residency: - Should we destroy the sandbox/hdfs-data when shutting down a DN? - If starting DN on node that was previously running a DN, can/should we try to revive the existing data? I think this is one of the key challenges for a production quality

Re: Failed to perform recovery: Incompatible slave info detected

2014-06-19 Thread Vinod Kone
to the metadata feature though - do you know why just the 'id' of the slaves isn't used? As it stands adding disk storage, cores or RAM to a slave will cause it to drop out of cluster - does checking the whole metadata provide any benefit vs. checking the id? On 18 June 2014 19:46, Vinod Kone vinodk

Re: Framework Starvation

2014-06-19 Thread Vinod Kone
On Thu, Jun 19, 2014 at 10:46 AM, Vinod Kone vi...@twitter.com wrote: Waiting to see your blog post :) That said, what baffles me is that in the very beginning when only two frameworks are present and no tasks have been launched, one framework is getting more allocations than other (see

Re: cgroups memory isolation

2014-06-19 Thread Vinod Kone
On Thu, Jun 19, 2014 at 11:33 AM, Sharma Podila spod...@netflix.com wrote: Yeah, having soft-limit for memory seems like the right thing to do immediately. The only problem left to solve being that it would be nicer to throttle I/O instead of OOM for high rate I/O jobs. Hopefully the soft

Re: Failed to perform recovery: Incompatible slave info detected

2014-06-18 Thread Vinod Kone
the case until cfs was enabled. On 18 June 2014 18:34, Vinod Kone vinodk...@gmail.com wrote: Hey Dick, Regarding slave recovery, any changes in the SlaveInfo (see mesos.proto) are considered as a new slave and hence recovery doesn't proceed forward. This is because Master caches SlaveInfo

Re: Framework Starvation

2014-06-13 Thread Vinod Kone
In case you didn't receive my email from @twitter domain. On Thu, Jun 12, 2014 at 8:20 AM, Claudiu Barbura claudiu.barb...@atigeo.com wrote: We had to change the drf_sorter.cpp/hpp and hierarchical_allocator_process.cpp files. Hey Claudiu. Can you share the patch? @vinodkone

Re: Error while running Mesos slave on Mac OSX 10.9.3

2014-06-09 Thread Vinod Kone
as an identifier for the slave. Thanks! prakhar On Mon, Jun 9, 2014 at 1:56 PM, Vinod Kone vinodk...@gmail.com wrote: Looks like gethostbyname2 call is returning an error. I've seen this before on my mac when i have vpn software running (or incorrectly stopped). im surprised though that master

Re: Framework Starvation

2014-06-03 Thread Vinod Kone
offers and are able to run queries again (see attached log_after_starvation file). Let me know if you need the slave logs. Thank you! Claudiu From: Vinod Kone vinodk...@gmail.com Reply-To: user@mesos.apache.org user@mesos.apache.org Date: Friday, May 30, 2014 at 10:13 AM To: user

Re: Framework Starvation

2014-05-30 Thread Vinod Kone
Hey Claudiu, Mind posting some master logs with the simple setup that you described (3 shark cli instances)? That would help us better diagnose the problem. On Fri, May 30, 2014 at 1:59 AM, Claudiu Barbura claudiu.barb...@atigeo.com wrote: This is a critical issue for us as we have to shut

Re: Mesos with non clustered environment.

2014-05-30 Thread Vinod Kone
Hey Raymond, Glad to hear that you are interested in Mesos. Please see my answers inline. It specifically is talking about resource requirements at the framework level. What if some tasks in the one framework require a GPU and others do not ? The kind of resources that tasks from Beaker

Re: Mesos master behind NAT

2014-05-30 Thread Vinod Kone
process::schedule() @ 0x7fdb5f394b50 start_thread @ 0x7fdb5f0df0ed (unknown) I guess I have to use directly IP address, right? On 23 May 2014 17:38, Vinod Kone vinodk...@gmail.com wrote: 0.18.0 https://issues.apache.org/jira/browse/MESOS-672 On Fri

Re: How to kill stuck frameworks in mesos

2014-05-28 Thread Vinod Kone
On Tue, May 27, 2014 at 8:56 PM, Manivannan citizenm...@gmail.com wrote: *What is the default fail over timeout ? * The default failover timeout is 0s. You can confirm this by grepping master log for lines that look like Giving framework framework-id time to failover. I'm surprised that master

Re: ExecutorDriver

2014-05-27 Thread Vinod Kone
On Fri, May 16, 2014 at 12:30 PM, Diptanu Choudhury dipta...@gmail.comwrote: Is the ExecutorDriver that one gets in a launchTask callback in a Mesos Executor singleton? I am currently caching the instance of the ExecutorDriver when a launchTask is called in an Akka Actor which monitors a

Re: Mesos master behind NAT

2014-05-23 Thread Vinod Kone
You can use --hostname to tell master to publish a different address in zk. @vinodkone Sent from my mobile On May 23, 2014, at 12:40 AM, Tomas Barton barton.to...@gmail.com wrote: Hi, is it possible to run a Mesos master behind NAT? With the --ip flag I can set IP address of an actual

Re: Mesos master behind NAT

2014-05-23 Thread Vinod Kone
directly IP address, right? On 23 May 2014 17:38, Vinod Kone vinodk...@gmail.com wrote: 0.18.0 https://issues.apache.org/jira/browse/MESOS-672 On Fri, May 23, 2014 at 8:11 AM, Tomas Barton barton.to...@gmail.comwrote: Hey Vinod, thanks! That's exactly what I was looking for. I haven't

Re: Mesos / Libprocess ENETUNREACH

2014-05-21 Thread Vinod Kone
-mesos-user@incubator (this mailing list is deprecated) Tom, Both the framework (and slaves) and master need to be able to talk to each other. IOW, if one of the end points uses a private IP (presumably thats the case with framework behind a VPN) then it wouldn't work. If you want the

Re: callback port

2014-05-19 Thread Vinod Kone
the var set. On Mon, May 19, 2014 at 10:19 AM, Vinod Kone vinodk...@gmail.com wrote: Probably. How are you setting the LIBPROCESS_PORT in Marathon? It has to be set via CommandInfo.Environment() of the task/executor for this to take effect. On Fri, May 16, 2014 at 9:41 AM, Scott Clasen sc

Re: [VOTE] Release Apache Mesos 0.18.2 (rc1)

2014-05-16 Thread Vinod Kone
+1 make check passed. Cent OS 6 w/ gccc 4.8 On Wed, May 14, 2014 at 8:33 PM, Iven Hsu ive...@gmail.com wrote: +1 make check succeeded in Arch Linux + clang 3.4.1 2014-05-15 3:06 GMT+08:00 Niklas Nielsen n...@qni.dk: Hi all, Please vote on releasing the following candidate as Apache

Re: Where did 0.18.1 go? Suggesting 0.18.2

2014-05-13 Thread Vinod Kone
+1 On Tue, May 13, 2014 at 10:54 AM, Benjamin Hindman b...@eecs.berkeley.eduwrote: +1! On Tue, May 13, 2014 at 9:51 AM, Niklas Nielsen n...@qni.dk wrote: Hey everyone, First and foremost, I apologize for the radio silence on my part with regards to the 0.18.1 release. We didn't

Re: protecting mesos from fat fingers

2014-05-06 Thread Vinod Kone
On Tue, May 6, 2014 at 2:01 PM, David Greenberg dsg123456...@gmail.comwrote: We are actually working on solving #2, by adding mutual authentication between masters and slaves, and ensure that each group knows in advance what the valid masters/slaves are. This allows us to ensure that no

Re: [VOTE] Release Apache Mesos 0.18.1 (rc2)

2014-05-02 Thread Vinod Kone
+1 make check passes on OSX 10.9 w/ gcc-4.8 On Wed, Apr 30, 2014 at 11:18 PM, Niklas Nielsen n...@qni.dk wrote: Hi all, Please vote on releasing the following candidate as Apache Mesos 0.18.1. 0.18.1 includes the following:

Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-21 Thread Vinod Kone
On Mon, Apr 21, 2014 at 3:10 PM, Sharma Podila spod...@netflix.com wrote: On a related note, what if framework scheduler is up while Mesos master goes down. Then, if Mesos master restarts after a time interval greater than framework failover timeout, what is the expected behavior? Would the

Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread Vinod Kone
On Thu, Apr 17, 2014 at 2:56 PM, David Greenberg dsg123456...@gmail.comwrote: My follow-up question is this--is there a way to tell whether I'm outside of the timeout window? I'd like to have my framework check ZK and determine whether it's w/in the framework timeout or not, so that it can

Re: 0.18.1

2014-04-15 Thread Vinod Kone
On Mon, Apr 14, 2014 at 10:10 PM, Vinod Kone vi...@twitter.com wrote: Looks like I missed cherry-picking the fix for https://issues.apache.org/jira/browse/MESOS-1045 into 0.18.0. So I would like to cut 0.18.1 with the cherry-pick. If there is any other important fix that belongs to 0.18

Re: Mesos slaves disconnecting because of Zookeeper?

2014-04-15 Thread Vinod Kone
mess 0.17.0 had a major refactor around interaction with ZooKeeper. So I would definitely recommend giving it a try and see if the problem persists. On Tue, Apr 15, 2014 at 11:59 AM, Ted Young tyo...@guidewire.com wrote: Anyone have any suggestions? I'm still seeing these problems and it's

Re: 0.18.1

2014-04-15 Thread Vinod Kone
, Vinod Kone vinodk...@gmail.com wrote: On Mon, Apr 14, 2014 at 10:10 PM, Vinod Kone vi...@twitter.com wrote: Looks like I missed cherry-picking the fix for https://issues.apache.org/jira/browse/MESOS-1045 into 0.18.0. So I would like to cut 0.18.1 with the cherry-pick. If there is any

0.18.1

2014-04-14 Thread Vinod Kone
Looks like I missed cherry-picking the fix for https://issues.apache.org/jira/browse/MESOS-1045 into 0.18.0. So I would like to cut 0.18.1 with the cherry-pick. If there is any other important fix that belongs to 0.18.* release but didn't make it into 0.18.0 please reply to this thread and I'll

Re: Marathon does not register with mesos

2014-04-13 Thread Vinod Kone
Hey Mukesh, Mind pasting the master and marathon logs? That would help us diagnose. Vinod On Sun, Apr 13, 2014 at 11:56 AM, Mukesh G muk...@gmail.com wrote: Using marathon 0.4.1 and mesos 0.18 on Centos 6.4 platform, I am able to successfully bring up mesos master, zookeeper and mesos

[RESULT] [VOTE] Release Apache Mesos 0.18.0 (rc6)

2014-04-09 Thread Vinod Kone
-940f-0cdd6148d66b' sh: line 0: cd: spark-0.9.tar.gz: Not a directory sh: ./sbin/spark-executor: No such file or directory -- Cheers, Tim Freedom, Features, Friends, First - Fedora https://fedoraproject.org/wiki/SIGs/bigdata -- *From: *Vinod Kone

Re: Using mesos with storm

2014-02-26 Thread Vinod Kone
On Wed, Feb 26, 2014 at 11:37 AM, Andrew Milkowski amgm2...@gmail.comwrote: I0226 14:30:12.982986 45829 slave.cpp:536] Successfully attached file

[RESULT][VOTE] Release Apache Mesos 0.16.0 (rc5)

2014-02-06 Thread Vinod Kone
Hi all, The vote for Mesos 0.16.0 (rc5) has passed with the following votes. +1 (Binding) -- Niklas Nielsen Benjamin Hindman Benjamin Mahler There were no 0 or -1 votes. Please find the release at: https://dist.apache.org/repos/dist/release/mesos/0.16.0 It is

Fwd: [VOTE] Release Apache Mesos 0.16.0 (rc5)

2014-02-06 Thread Vinod Kone
, Feb 2, 2014 at 1:23 PM Subject: Re: [VOTE] Release Apache Mesos 0.16.0 (rc5) To: Vinod Kone vinodk...@gmail.com, user@mesos.apache.org user@mesos.apache.org, d...@mesos.apache.org +1, Tested on Ubuntu 13.10 GCC 4.8.1 and Mac OS X Mavericks GCC 4.8.1 On January 31, 2014 at 11:27:28 AM, Vinod Kone

Re: Please Help me about hadoop on Mesos

2014-01-27 Thread Vinod Kone
I have some questions about running hadoop on top of Mesos, please help me. 1. when a tasktracker is launched, if n cpu core are allocated to it, it can only launch n-1 map tasks. Could someone tell me why? And, if I want to run map-only job, what should I do to run n map tasks on a n cpu

Re: Re: Please Help me about hadoop on Mesos

2014-01-27 Thread Vinod Kone
On Mon, Jan 27, 2014 at 10:07 AM, HUO Jing huoj...@ihep.ac.cn wrote: So, at the very beginning, if all the resource are assigned to hadoop, and after that, there are always enough jobs in jobtracker, is that meanning that the other framework will never get resource? Is it fair to do so ?

Re: How Mesos limits resources used by the executors (OSX)

2014-01-23 Thread Vinod Kone
Hey David. Mesos doesn't enforce resource limits when run on OSX. @vinodkone On Thu, Jan 23, 2014 at 11:57 AM, David Richardson pudnik...@gmail.comwrote: Hello, Mesos is supported on OSX. However, OSX doesn't have cgroups. How does Mesos enforce limit resources on executors in OSX?

Re: How Mesos limits resources used by the executors

2014-01-22 Thread Vinod Kone
. On Tue, Jan 21, 2014 at 2:28 PM, Vinod Kone vi...@twitter.com wrote: The way you set task resources looks correct. Can you paste what the slave logs say regarding the task/executor, esp. the lines that are from the cgroups isolator? Also, what is the command line of the slave? @vinodkone

Re: Mesos logging configuration questions

2014-01-21 Thread Vinod Kone
If --log_dir is not specified nothing is written to disk. ➜ build git:(master) ✗ ./bin/mesos-master.sh --help ... ... --log_dir=VALUELocation to put log files (no default, nothing is written to disk unless specified;

Re: How Mesos limits resources used by the executors

2014-01-21 Thread Vinod Kone
:02 PM, Vinod Kone vi...@twitter.com wrote: Mesos uses cgroupshttps://www.kernel.org/doc/Documentation/cgroups/cgroups.txtto limit cpu and memory. It is indeed surprising that your executor in not OOMing when using more memory than requested. Can you tell us what the following values look

Re: What happens after registering framework

2014-01-17 Thread Vinod Kone
Yes that is correct, assuming there are slave(s) registered with the master. @vinodkone On Thu, Jan 16, 2014 at 11:05 PM, Sai Sagar jsaisa...@gmail.com wrote: Hi all, If a framework is able to register successfully, what happens in the next step? Will the master send resource offers

unsubscribe

2013-12-31 Thread Vinod Kone
@vinodkone

Re: Porting an app

2013-12-27 Thread Vinod Kone
I can't really find an example that is an end-to-end use case. By that I mean, I would like to know how to put the scheduler and the executor in the correct places. Right now I have a single jar with can be run from the command line: java -jar target/collector.jar and that would take care of

Re: Mesos slave GC clarification

2013-12-27 Thread Vinod Kone
I'm still not sure what exactly is the issue here but we have had couple of gc related fixes included in 0.15.0-rc5. Are you willing to try that out? On Thu, Dec 26, 2013 at 10:56 AM, Thomas Petr tp...@hubspot.com wrote: Hi, We're running Mesos 0.14.0-rc4 on CentOS from the mesosphere

<    1   2   3   4   5   6   >