Re: Volunteers needed

2018-09-18 Thread Bill Farner
I’m happy to pitch in for periodic review. Anyone is welcome to email me requesting a review. I don’t monitor incoming reviews, so unfortunately I will need to be contacted out-of-band. On Tue, Sep 18, 2018 at 10:45 AM Renan DelValle wrote: > All, > > We are in dire need of folks who would be

Re: tiers.info question

2018-02-05 Thread Bill Farner
> > If a tier is removed, for example, will the Aurora code fail to read the > replicated log on scheduler restart Yes. A task in storage with an unknown tier will fail during recovery here

Re: Detecting Flapping Tasks in Aurora

2018-02-01 Thread Bill Farner
You could scan for tasks that are in, or have been in the THROTTLED state. You can adjust the time intervals for throttled tasks with these schedule

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Bill Farner
the command line option? One can use >> an older Driver if KILL is preferred for some reason. >> >> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner wrote: >> >>> This situation is much simpler if task ID == executor ID. I can't come >>> up with a goo

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Bill Farner
This situation is much simpler if task ID == executor ID. I can't come up with a good reason why this is not the case today. Our executor IDs originally included static prefix, though i do not recall any rationale for this. When Renan added custom executor support, this static prefix was made co

Re: explain these replication logs?

2017-12-13 Thread Bill Farner
I'm unfamiliar. The mesos dev list may be able to give more insight. I'd be interested in your findings! On Tue, Dec 12, 2017 at 4:32 PM, Mohit Jaggi wrote: > For the same position I see two bursts of writes, one around 00:12:36 and > another 12 min earlier. Any idea what this means? > > ~/a/a

Re: shutdown vs kill API is Mesos

2017-12-09 Thread Bill Farner
es still exist so we can't make it the default. > > On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner wrote: > >> Aurora pre-dates SHUTDOWN by several years, so the option was not >> present. Additionally, the SHUTDOWN call is not available in the API used >> by Aurora.

Re: shutdown vs kill API is Mesos

2017-12-09 Thread Bill Farner
Aurora pre-dates SHUTDOWN by several years, so the option was not present. Additionally, the SHUTDOWN call is not available in the API used by Aurora. Last i knew, Aurora could not use the "new" API because of performance issues in the implementation, but i do not know where that stands today. ht

Re: [ANNOUNCE] 0.19.0 release

2017-12-07 Thread Bill Farner
enan > > On Sat, Nov 11, 2017 at 8:50 AM, Bill Farner wrote: > >> Hello folks, >> >> Aurora 0.19.0 has been released! Please see the blog post for more >> details: https://aurora.apache.org/blog/aurora-0-19-0-released/ >> >> >> Cheers, >> >> Bill >> > >

Re: sliding stats testing

2017-12-02 Thread Bill Farner
The underlying Rate stats used here are only updated when sampled, so the value you have sent to accumulate() is not reflected in rates and ratios until doSample() is called on them. For the purposes of this test, it may be easiest to integrate with TimeSeriesRepositoryImpl and manually induce sam

Re: Aurora pauses adding offers

2017-11-29 Thread Bill Farner
quot;Lost leadership, committing suicide" during > outage. I do see it at other times. > > On Wed, Nov 29, 2017 at 1:52 PM, Bill Farner wrote: > >> - Does log replication "maintain" ZK connections and suffer when a NIC >>> flaps? >> >> &

Re: Aurora pauses adding offers

2017-11-29 Thread Bill Farner
when a NIC > flaps? > - If only 1 of 5 ZK's have this issue, could there still be a problem? > > On Wed, Nov 29, 2017 at 11:08 AM, Bill Farner wrote: > >> is there a place I can inject a pre-processor for the API calls >> >> >> There is no off-the-shelf

Re: Aurora taking really long to reschedule a full cluster

2017-11-29 Thread Bill Farner
That works out to scheduling about 1 task/sec, which is at least one order of magnitude lower than i would expect. Are you sure tasks were scheduling and continuing to run, rather than exiting/failing and triggering more scheduling work? What build is this from? Can you share (scrubbed) schedule

Re: Aurora pauses adding offers

2017-11-29 Thread Bill Farner
ifying Aurora code > "inline", is there a place I can inject a pre-processor for the API calls? > > On Mon, Nov 27, 2017 at 4:59 PM, Bill Farner wrote: > >> I'd also suggest focusing on the source of the congestion. Aurora should >> offer quite high scheduling

Re: Aurora pauses adding offers

2017-11-27 Thread Bill Farner
t; of load on Aurora. Usually that is fine but if Aurora slowed down due to >>> transient problems, it can signal that to upstream software in the same way >>> that busy web servers do during cyber Monday sales :-) >>> >>> On Mon, Nov 27, 2017 at 12:06 PM, Bil

Re: Aurora pauses adding offers

2017-11-27 Thread Bill Farner
me) indicate heavy load. Perhaps, this > "defense" already exists? > > > On Mon, Nov 13, 2017 at 8:38 AM, Bill Farner wrote: > >> The next level is to determine why the storage lock is being held. >> Common causes include: >> >> 1. storage snapshot slo

Re: HTTP API examples

2017-11-27 Thread Bill Farner
Mon, Nov 27, 2017 at 11:39 AM, Mohit Jaggi wrote: > I see. There is no JSON interface then, clients have to use thrift? > > On Mon, Nov 27, 2017 at 10:50 AM, Bill Farner wrote: > >> I suspect you are looking at /apibeta, which serves a javadoc-style doc >> from a GET reque

Re: HTTP API examples

2017-11-27 Thread Bill Farner
I suspect you are looking at /apibeta, which serves a javadoc-style doc from a GET request. There is no support for this interface, however, and it is subject to removal in the future. That being said, you can see an example of querying by task status in this test case

Re: Apache Aurora holding resources which makes other framework starve

2017-11-26 Thread Bill Farner
The file to edit should indeed be /etc/default/aurora-scheduler, specifically by populating EXTRA_SCHEDULER_ARGS: EXTRA_SCHEDULER_ARGS="-min_offer_hold_time=30secs" On Sun, Nov 26, 2017 at 6:57 AM, bigggyan wrote: > Thanks Mohit for the information. > I have installed Apache Aurora as described

Re: reverting logback dependency update

2017-11-20 Thread Bill Farner
> > Sent from my iPhone > > On Nov 20, 2017, at 6:42 PM, Bill Farner wrote: > > I don't think it is fair to the community or practical to hold back > library versions because of conflicts in proprietary custom builds of > Aurora. So in general, i am -1 on the preced

Re: reverting logback dependency update

2017-11-20 Thread Bill Farner
I don't think it is fair to the community or practical to hold back library versions because of conflicts in proprietary custom builds of Aurora. So in general, i am -1 on the precedent this would set. On Mon, Nov 20, 2017 at 5:53 PM, Mohit Jaggi wrote: > Folks, > Due to a conflict with another

Re: Aurora pauses adding offers

2017-11-13 Thread Bill Farner
schedulers, schedulers to zookeeper, or between zookeepers As an immediate (partial) remedy, i suggest you upgrade to eliminate the use of SQL/mybatis in the scheduler. This helped twitter improve (1) and (1a). commit f2755e1 Author: Bill Farner Date: Tue Oct 24 23:34:09 2017 -0700 Exclusively

[ANNOUNCE] 0.19.0 release

2017-11-11 Thread Bill Farner
Hello folks, Aurora 0.19.0 has been released! Please see the blog post for more details: https://aurora.apache.org/blog/aurora-0-19-0-released/ Cheers, Bill

Re: Aurora pauses adding offers

2017-11-10 Thread Bill Farner
> > I suspect they are getting enqueued Just to be sure - the offers do eventually get through though? The most likely culprit is contention for the storage write lock, observable via spikes in stat log_storage_write_lock_wait_ns_total. I see that a lot of getJobUpdateDetails() and getTasksWit

[CVE-2016-4437] Apache Aurora information disclosure vulnerability (amended)

2017-11-01 Thread Bill Farner
Please see below for the amended notice. The prior announcement indicated that releases prior to 0.10.0 were unaffected, which is incorrect. Version 0.8.0 - 0.18.0 included vulnerable shiro versions. Versions Affected: Aurora 0.8.0 - 0.18.0 Description: The affected versions of the scheduler re

[CVE-2016-4437] Apache Aurora information disclosure vulnerability

2017-11-01 Thread Bill Farner
Versions Affected: Aurora 0.10.0 to 0.18.0 Description: The affected versions of the scheduler rely on a version of Apache Shiro which is vulnerable to CVE-2016-4437. Under certain conditions, the vulnerability allows remote attackers to execute arbitrary code or bypass intended access restrictio

[ANNOUNCE] 0.18.1 release

2017-11-01 Thread Bill Farner
Hello folks, I'm pleased to announce that Apache Aurora 0.18.1 has been released! More details can be found in the blog post: https://aurora.apache.org/blog/aurora-0-18-1-released/ Cheers, Bill

Re: distinguishing failure types during upgrade

2017-11-01 Thread Bill Farner
out the rollback logic > also into this external system. > > On Wed, Nov 1, 2017 at 8:39 AM, Bill Farner wrote: > >> Can Aurora distinguish between failures caused by the upgrade itself or >>> other transient systemic issues >> >> >> There isn't an

Re: distinguishing failure types during upgrade

2017-11-01 Thread Bill Farner
> > Can Aurora distinguish between failures caused by the upgrade itself or > other transient systemic issues There isn't any signal i know of that would allow Aurora to independently determine the cause of task failures in a generic way. Two options come to mind: 1. Human intervention - aurora

Re: updateconfig doc

2017-10-30 Thread Bill Farner
Correct! On Mon, Oct 30, 2017 at 2:32 PM, Mohit Jaggi wrote: > Got it...and wait_for_batch_completion changes this from a "sliding" to a > "rolling" window ? > > On Mon, Oct 30, 2017 at 2:28 PM, Bill Farner wrote: > >> Joshua beat me to the rep

Re: updateconfig doc

2017-10-30 Thread Bill Farner
Joshua beat me to the reply, so now you have corroboration for his correction :-) On Mon, Oct 30, 2017 at 2:26 PM, Bill Farner wrote: > Clarification - shard and instance are (unfortunately) used > interchangeably in some of our docs, despite the fact that shard can have a > differen

Re: updateconfig doc

2017-10-30 Thread Bill Farner
Clarification - shard and instance are (unfortunately) used interchangeably in some of our docs, despite the fact that shard can have a different meaning in other contexts. The meaning of batch_size doesn't match either rephrasing you offer, perhaps the docs need work! batch_size effectively tell

Re: Lost framework registered event [Was Re: leader election issues]

2017-10-26 Thread Bill Farner
info. > > Re: code, we have a fork which is very close to master. > > On Wed, Sep 27, 2017 at 10:03 PM, Bill Farner wrote: > >> What commit/release was this with? From the looks of the log contents, >> it's not master. I'd like to make sure i'm look

Re: orphaned thermos

2017-10-26 Thread Bill Farner
If the executor runs out of memory, i think it should be assumed that it will no longer be well-behaved. It seems most sensible for the mesos agent to clean up in this case. On Thu, Oct 26, 2017 at 11:56 AM, Mohit Jaggi wrote: > We found several zombie executors on a cluster. Thermos logs indic

Re: gorealis is now officially a PayPal Open Source Project

2017-10-16 Thread Bill Farner
Congrats on releasing! On Oct 16, 2017, 4:15 PM -0500, Renan DelValle , wrote: > Hi all, > > Just wanted to drop a note about a recent update for gorealis[1]. For those > who aren't familiar with it, gorealis is a library that aims to enable > users to programmatically interact with the Aurora sc

Re: fix for aurora-1945

2017-10-02 Thread Bill Farner
ool increase performance? Or > is there a lock somewhere (like DB updates) that makes that useless? > > On Mon, Oct 2, 2017 at 5:07 PM, Bill Farner wrote: > >> Is there a place where we store "used" offers >> >> >> Once the scheduler has decided to

Re: fix for aurora-1945

2017-10-02 Thread Bill Farner
in job/task state but that will be too > expensive to check. > > On Mon, Oct 2, 2017 at 4:44 PM, Bill Farner wrote: > >> That's true, but it doesn't appear the comment is trying to lay out all >> possible scenarios. Instead, it is attempting to explain the ratio

Re: fix for aurora-1945

2017-10-02 Thread Bill Farner
That's true, but it doesn't appear the comment is trying to lay out all possible scenarios. Instead, it is attempting to explain the rationale for offerManager.banOffer(offerId) a few lines later. On Mon, Oct 2, 2017 at 4:30 PM, Mohit Jaggi wrote: > Folks, > In the code below, isn't there a 3rd

Re: aurora crash in PendingTaskProcessor

2017-09-29 Thread Bill Farner
the fix involves making the map of offers by agent id a > concurrent map...I can contribute that. > > On Fri, Sep 29, 2017 at 9:09 AM, Bill Farner wrote: > >> This is due to multiple offers for the same agent, rather than duplicate >> offers. I don't see a specific b

Re: aurora crash in PendingTaskProcessor

2017-09-29 Thread Bill Farner
This is due to multiple offers for the same agent, rather than duplicate offers. I don't see a specific bug in the suspect code (OfferManager.java), but it does stand out as subject to races. Specifically, there is a lack of synchronization when checking for an offer exists for a given agent ID an

Re: Lost framework registered event [Was Re: leader election issues]

2017-09-27 Thread Bill Farner
7.205 [Lifecycle-0, >>>>>>>>> StateMachine$Builder:389] SchedulerLifecycle state machine transition >>>>>>>>> LEADER_AWAITING_REGISTRATION -> DEAD >>>>>>>>> aurora-scheduler.log:Sep 26 18:21:37 machine62 >>

Re: leader election issues

2017-09-26 Thread Bill Farner
> > Is there a reason a non-leading scheduler will talk to Mesos No, there is not a legitimate reason. Did this occur for an extended period of time? Do you have logs from the scheduler indicating that it lost ZK leadership and subsequently interacted with mesos? On Tue, Sep 26, 2017 at 1:02 P

Re: Way to kill failed instances during a unsuccessful job update

2017-09-19 Thread Bill Farner
Aurora doesn't currently offer a way to do what you describe. A job in the scheduler describes a provisioning goal (number of instances), and we assume the scheduler shouldn't choose to modify that goal over time. To that end, the scheduler doesn't consider it a problem to infinitely restart the

Re: Why doesn't announcer delay until task indicates it's ready?

2017-03-21 Thread Bill Farner
Announcement is done immediately to announce presence of an instance for other services to determine what to do from there. A use case we considered was allowing monitoring of a service via HTTP before the service is ready for traffic. This is useful, for example, if the application has a long b

Re: Prevent service Job moved from one machine to another periodically

2016-06-25 Thread Bill Farner
Sun, Jun 26, 2016 at 1:09 AM, Bill Farner > wrote: > >> Entering the KILLING state suggests that a user issued a kill command for >> the service. Does that sound plausible? >> >> >> On Saturday, June 25, 2016, Ziliang Chen > > wrote: >> >>>

Re: Prevent service Job moved from one machine to another periodically

2016-06-25 Thread Bill Farner
Entering the KILLING state suggests that a user issued a kill command for the service. Does that sound plausible? On Saturday, June 25, 2016, Ziliang Chen wrote: > Instructed KILL. > > 4 minutes ago - KILLED : Instructed to kill task. > >- 06/25 22:32:23 LOCAL • PENDING >- 06/25 22:33:

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

2016-06-15 Thread Bill Farner
> > Assuming we don't run into any roadblocks: How about changing the > default of `-zk_use_curator` from False to True for the next release? +1 I believe that was the plan of action, though i can't recall if it was recorded anywhere more official than the dev list. On Wed, Jun 15, 2016 at 9:44

Re: Delete cron/service job in Aurora

2016-06-11 Thread Bill Farner
> > May i ask if there is a way to completely remove cron/service job from > Aurora ? > For cron job, when i deschedule them, they become adhoc jobs. > For service job, what i can do is killall them. > Still on the Aurora UI, i can still see all of the tasks. Dead tasks (the remnants of a killed

Re: Get executors task name, or managing secrets for tasks

2016-06-07 Thread Bill Farner
The job key is indeed be available to the executor, as it is in TaskConfig, first used here: https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/executor/aurora_executor.py#L257 The most common approach I've seen for secrets is out-of-band management using file system permis

Re: Inverse diversity constraints?

2016-04-04 Thread Bill Farner
There is no support for affinity-based scheduling today, but it has come up several times and i'm open to the idea. Without great care, this type of constraint could become difficult to reason about and implement in a way that does not suffer from pathological cases. Today, scheduling is entirely

Re: Get active/running instance IDs of a job.

2016-03-27 Thread Bill Farner
You can get that data from the getConfigSummary API call: https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L957-L958 which populates Result.configSummaryResult: https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/a

RE: CentOS 6.7 and Apache Aurora

2016-03-25 Thread Bill Farner
Please consider pushing forward, that patch seems to have gone stale. -=Bill On Fri, Mar 25, 2016 at 6:18 AM -0700, "Rice, Ben" wrote: Thanks.   I ended up building my own and it looks identical to what’s on that reviewboard. J  Luckily, it ended up being not too much wo

Re: Aurora Thrift API Info

2016-03-23 Thread Bill Farner
t;my_job" >> //taskQuery.TaskIds = nil >> //taskQuery.Statuses = nil >> //taskQuery.InstanceIds = nil >> //taskQuery.SlaveHosts = nil >> //taskQuery.Environment = "" >> //taskQuery.JobKeys = nil >> //taskQuery.Offset

Re: Aurora Thrift API Info

2016-03-21 Thread Bill Farner
Agent: Go-http-client/1.1 > Accept-Encoding: gzip > > I can attach the complete capture if that is needed. > > Link to my code on github: > https://github.com/krish7919/aurora_thrift_api > > > > > > -- > κρισhναν > > On Sun, Mar 20, 2016 at 2:29 AM, Bill Farner

Re: Aurora Thrift API Info

2016-03-20 Thread Bill Farner
d '=' >>>> WARNING - Running 'gofmt -w ./gen-go//api/ttypes.go' failed. >>>> >>>> I can modify the golang code by hand, but I would like to play it safe >>>> and use the working compiler from the debian repos. >>>> >

Re: Aurora Thrift API Info

2016-03-20 Thread Bill Farner
Regarding documentation - Maxim is correct that there isn't much in the way of independent/holistic docs for the thrift API. There is, however, scant javadoc-style documentation within the IDL spec itself: https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.t

Re: Aurora Thrift API Info

2016-03-19 Thread Bill Farner
> *From:* Krish [mailto:krishnan.k.i...@gmail.com] > *Sent:* Wednesday, March 16, 2016 2:36 PM > *To:* user@aurora.apache.org; Bill Farner ; > ma...@apache.org > *Subject:* Re: Aurora Thrift API Info > > > > Thanks, Maxim & Bill! > > > > I would love so

Re: [Thermos] TaskRunnker (Thermos) any child process SIGTERM Problem

2016-03-19 Thread Bill Farner
I believe the first ticket you posted captures the behavior we currently target - thermos only signals processes that it directly launches and monitors. It is the responsibility of those process to signal any children they fork. As for finalization, it appears we have a bug on master...but i'm no

Re: Aurora Thrift API Info

2016-03-19 Thread Bill Farner
.9.1] > Mar 19 18:12:23 aurora-3 start.bash[21316]: at > org.apache.thrift.protocol.TJSONProtocol.readMessageBegin(TJSONProtocol.java:795) > ~[libthrift-0.9.1.jar:0.9.1] > Mar 19 18:12:23 aurora-3 start.bash[21316]: at > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) &g

Re: Aurora Thrift API Info

2016-03-18 Thread Bill Farner
>>>>>> >>>>>>> Thanks, Bill. >>>>>>> >>>>>>> Well I have started my foray into the the thrift API today. And I >>>>>>> think I am stuck with some thrift configs. >>>>>>> &g

Re: VCS for the backing store for definitions ?

2016-03-07 Thread Bill Farner
That is currently an exercise left to the user. Given that job definitions can get quite involved (multiple files, machine-local state, different external DSL), i imagine that a one-size-fits-all solution would be quite a challenge to define. On Mon, Mar 7, 2016 at 9:31 AM, Paul Hammant wrote:

Re: Stacktrace when running Apache Aurora

2016-03-03 Thread Bill Farner
obs in production. > I had to refer to a dozen and more sites and blogs and manuals and source > to get so far; and got help from engineers in various mailing lists. > A unified guide should be helpful, imho. > > > On Thursday 3 March 2016, Bill Farner > wrote: > >> Wow!

Re: Stacktrace when running Apache Aurora

2016-03-03 Thread Bill Farner
ally trigger a docker pull and docker run without issues from >>>> the slave (which is also reflected properly outside the slave container >>>> with docker images and docker ps). >>>> >>>> However, when I try to run an aurora job with hello-docker container, >>&

Re: Explicit job execution order

2016-02-02 Thread Bill Farner
the resolution of > https://issues.apache.org/jira/browse/AURORA-735 is Later :) > > On 02 Feb 2016, at 21:44, Bill Farner wrote: > > In general, i've assumed that job dependencies create more problems than > they solve (e.g. scheduling behavior when a parent job is removed, &g

Re: Explicit job execution order

2016-02-02 Thread Bill Farner
In general, i've assumed that job dependencies create more problems than they solve (e.g. scheduling behavior when a parent job is removed, parent/child relationships that span auth groups, etc). Dependencies seem handy for setting up and tearing down groups of jobs for things like development env

Re: Announcer problem

2016-01-25 Thread Bill Farner
fice. > > On Mon, Jan 25, 2016 at 9:28 AM, Bill Farner wrote: > >> There's also 2 flags you need to pass to the executor via the scheduler: >> --announcer-enable, --announcer-ensemble. See here for example: >> https://github.com/apache/aurora/blob/master/exampl

Re: Announcer problem

2016-01-25 Thread Bill Farner
There's also 2 flags you need to pass to the executor via the scheduler: --announcer-enable, --announcer-ensemble. See here for example: https://github.com/apache/aurora/blob/master/examples/vagrant/upstart/aurora-scheduler.conf#L43 On that topic, does anyone else think --announcer-enable is redu

Re: Aurora blocks other frameworks

2016-01-21 Thread Bill Farner
Additionally, you will want to disable the preemptor (-enable_preemptor=false), as preemption in multi-framework mesos is currently futile. On Thu, Jan 21, 2016 at 6:53 PM, Maxim Khutornenko wrote: > Make sure you are running from master as we just had a new feature > committed that allows Auror

Re: Pre-checking if job can be scheduled?

2016-01-12 Thread Bill Farner
Quick pointers for after you read the contributing doc: 1. Skim the doc for developing on the scheduler https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md 2. Add the new API method https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen

Re: Pre-checking if job can be scheduled?

2016-01-12 Thread Bill Farner
I think that would be a cool addition to the API, and relatively easy to implement. I'd be happy to shepherd if you are willing to take a crack at a patch! On Tue, Jan 12, 2016 at 2:56 PM, Brian Hatfield wrote: > Hi, > > We currently run a (relatively) small Mesos/Aurora cluster, and don't > al

Re: Aurora Config Generation

2016-01-11 Thread Bill Farner
Please do share! I'm curious - what motivates you to continue using the client if you're bypassing the DSL? On Mon, Jan 11, 2016 at 9:30 PM, ben...@gmail.com wrote: > My motivating use case was generating configs for aurproxy ( > https://github.com/tellapart/aurproxy), which takes all of its >

[ANNOUNCE] Apache Aurora 0.11.0 debian packages

2016-01-10 Thread Bill Farner
I'm pleased to announce that official debian packages for Aurora are now available! You can find the files here: https://bintray.com/apache/aurora/debian-ubuntu-trusty/0.11.0 Cheers, Bill

Re: Documentation Structure

2015-12-29 Thread Bill Farner
I agree with your points about going into high levels of detail too quickly, and lacking introductory materials (including examples). Having all of these would be wonderful. This is the right discussion - let's figure out which docs we need, and what we need to carve out of existing docs. On Tue

[ANNOUNCE] 0.11.0 release

2015-12-23 Thread Bill Farner
Hello folks, I'm pleased to announce that Apache Aurora 0.11.0 has been released! More details can be found in the blog post: https://aurora.apache.org/blog/aurora-0-11-0-released/ Thanks to the many people who made this release possible! Cheers, Bill

[ANNOUNCE] 0.10.0 release

2015-12-12 Thread Bill Farner
Hello folks, We are late in making this announcement, but Apache Aurora 0.10.0 has been released! More details can be found in the blog post: https://aurora.apache.org/blog/aurora-0-10-0-released/ Thanks to the many people who made this release possible! Cheers, Bill

Re: "Running Job Pipelines in Aurora"

2015-12-10 Thread Bill Farner
Great stuff, thanks for pointing this out! On Wed, Dec 9, 2015 at 10:34 PM, Dave Lester wrote: > Paul Cavallaro at Oscar published a new blog post earlier today about > running job pipelines in Aurora: > http://dna.hioscar.com/post/134865566390/running-job-pipelines-in-aurora — > pretty awesome!

Re: Offers not being used

2015-12-09 Thread Bill Farner
It's GiB, but there is a slice of per-task resource overhead added for the executor. I believe the default is 128 MB RAM for that, which seems to line up with what you are seeing. On Wednesday, December 9, 2015, Andrew Jorgensen wrote: > I am currently using aurora version 0.9.0-rc0-2 and mesos

Re: Stage multiple jobs/tasks on the same host

2015-11-21 Thread Bill Farner
You could do this in an ad-hoc way with constraints (i.e. constrain the tasks to the same host name or other attribute of your choice). An approach that would require more heavy lifting on your end is to build a custom executor that is responsible for launching the containers (likely interacting d

Re: getting docker to work with aurora

2015-11-13 Thread Bill Farner
Are there any additional logs that I should be examining? Yes. The stderr/stdout of the executor lives in the task sandbox, which you'll find in the mesos agent's log (it includes the task ID). I suspect you'll find an error similar to what's mentioned in the ticket i linked below. > Is there

Re: Throttling task kill rates per job?

2015-10-28 Thread Bill Farner
For some resources (like disk, or more acutely - RAM), there's not much we can do to provide assurances. Ultimately resource-driven task termination is managed at the node level, and may represent a real exhaustion of the resource. I'd be worried that trying to augment this might trade one proble

Re: Stacktrace when running Apache Aurora

2015-10-27 Thread Bill Farner
0.9.0 uses Mesos' task >>>> reconciliation >>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API >>>> instead. >>>> >>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish >>>> wrote: >>>> >>>&g

Re: Stacktrace when running Apache Aurora

2015-10-20 Thread Bill Farner
source code. It gives you a running scheduler to >> play with. Once you have understood how it works, you can start trying to >> install it on your own (by reverse-engineering the vagrant box). >> >> >> Hope this helps a little, >> >> Stephan >> >> >> >&g

Re: Stacktrace when running Apache Aurora

2015-10-19 Thread Bill Farner
The typical flow is that you keep your .aurora file checked into git, and commit every time you deploy/update. When you change your file, you will instruct Aurora to update the live job (have a look at aurora update -h). Aurora will perform a rolling upgrade of your job to the new config. You'll u