Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-22 Thread Saisai Shao
Nan, I think Meisam already had a PR about this this, maybe you can discuss with him on the github based on the proposed code. Sorry I didn't follow the long discussion thread, but I think Paypal's solution sounds simpler. On Wed, Aug 23, 2017 at 12:07 AM, Nan Zhu wrote: > based on this result,

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-22 Thread Nan Zhu
based on this result, I think we should follow the bulk operation pattern Shall we move forward with the PR from Paypal? Best, Nan On Mon, Aug 21, 2017 at 12:21 PM, Meisam Fathi wrote: > Bottom line up front: > 1. The cost of calling 1 individual REST calls is about two order of > magnitu

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
Hi Marcelo, > I'm not really familiar with how multi-node HA was implemented (I > stopped at session recovery), but why isn't a single server doing the > update and storing the results in ZK? Unless it's actually doing > load-balancing, it seems like that would avoid multiple servers having > to

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
> Just an FYI, apache mailing lists cant share attachments. If you could > please upload the files to another file sharing site and include links > instead. > Thanks for the information. I added the files to the JIRA ticket and put the contents of the previous email as a comment. Here are the links

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Alex Bozarth
Subject:Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design I forgot to attach the first chart. Sorry about that. Thanks, Meisam On Mon, Aug 21, 2017 at 12:21 PM Meisam Fathi wrote: Bottom line up front: 1. The cost of calling

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
I forgot to attach the first chart. Sorry about that. [image: transfer_time_bar_plot.png] Thanks, Meisam On Mon, Aug 21, 2017 at 12:21 PM Meisam Fathi wrote: > Bottom line up front: > 1. The cost of calling 1 individual REST calls is about two order of > magnitude higher than calling a sin

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
Bottom line up front: 1. The cost of calling 1 individual REST calls is about two order of magnitude higher than calling a single batch REST call (1 * 0.05 seconds vs. 1.4 seconds) 2. Time to complete a batch REST call plateaus at about 10,000 application reports per call. Full story: I ex

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
I like that approach on paper, although I currently don't have much time to actually be able to review the PR and provide decent feedback. I think that regardless of the approach, one goal should be to probably separate what is being monitored from how it's being monitored; that way you can later

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Prabhu Kasinathan
As Meisam highlighted, in our case, we have Livy Multi-Node HA i.e livy running on 6 servers for each cluster, load-balanced, sharing livy metadata on zookeeper and running thousands of applications. With below changes, we are seeing good improvements due to batching the requests (one per livy node

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
On Wed, Aug 16, 2017 at 2:09 PM Nan Zhu wrote: > With time goes, the reply from YARN can only be larger and larger. Given > the consistent workload pattern, the cost of a large query can be > eventually larger than individual request > I am under the impression that there is a limit to the numbe

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 2:58 PM, Meisam Fathi wrote: > Livy can directly call org.apache.spark.deploy.SparkSubmit.main() with > proper arguments, which is what spark-submit ends up doing. > > I have at least three problems with this approach: > 1. It is a hack. > 2. Now that you pointed out, I see

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
Hi Marcelo, Thanks for your comments. I'd like to know your thoughts a different approach. Livy can directly call org.apache.spark.deploy.SparkSubmit.main() with proper arguments, which is what spark-submit ends up doing. I have at least three problems with this approach: 1. It is a hack. 2. Now

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
Thanks for the reply, Meisam Looks good to meet our current scenarios Rest APIs looks like more powerful but it needs to replace the current YarnClient with a self-made RestClient On Wed, Aug 16, 2017 at 2:23 PM, Meisam Fathi wrote: > Hi Nan, > > In the highlighted line > > > > https://githu

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 2:09 PM, Nan Zhu wrote: > With time goes, the reply from YARN can only be larger and larger. Given > the consistent workload pattern, the cost of a large query can be > eventually larger than individual request That's where filtering would help, if it's possible to do it e

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
Hi Nan, In the highlighted line > > https://github.com/apache/incubator-livy/pull/36/files#diff-a3f879755cfe10a678cc08ddbe60a4d3R75 > > I assume that it will get the reports of all applications in YARN, even > they are finished? That's right. That line will return reports for all Spark Applicati

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
With time goes, the reply from YARN can only be larger and larger. Given the consistent workload pattern, the cost of a large query can be eventually larger than individual request I would say go with individual request + thread pool or large batch for all first, if any performance issue is obser

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 12:57 PM, Nan Zhu wrote: > yes, we finally converge on the idea > > how large the reply can be? if I have only one running applications and I > still need to fetch 1000 > > on the other side > > I have 1000 running apps, what's the cost of sending 1000 requests even the > t

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
yes, we finally converge on the idea how large the reply can be? if I have only one running applications and I still need to fetch 1000 on the other side I have 1000 running apps, what's the cost of sending 1000 requests even the thread pool and yarn client are shared? On Wed, Aug 16, 2017 at

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 12:25 PM, Meisam Fathi wrote: > Out of these two problems, calling one spark-submit per application is the > biggest problem, but it can be solved by adding more Livy servers. Something like SPARK-11035 could also help here. Although the implementation of that particular s

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 12:27 PM, Nan Zhu wrote: > I am using your words *current*. What's the definition of "current" in > livy? I think that's all application which still keep some records in the > livy's process's memory space There are two views of what is current: Livy's and YARN's. They may

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
Thanks for the answer, Meisam! > The time consuming parts in the code are calls to YARN and not filtering and updating the data structures. In the highlighted line https://github.com/apache/incubator-livy/pull/36/files#diff-a3f879755cfe10a678cc08ddbe60a4d3R75 I assume that it will get the repor

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
I am using your words *current*. What's the definition of "current" in livy? I think that's all application which still keep some records in the livy's process's memory space So: 1. How you express this "current" in a query to YARN? I think you have to use ApplicationID (maybe there are some othe

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
Hi Nan, > > my question related to the undergoing discussion is simply "have you seen > any performance issue in > > https://github.com/apache/incubator-livy/pull/36/files#diff-a3f879755cfe10a678cc08ddbe60a4d3R75 > ? >

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 12:02 PM, Nan Zhu wrote: > Then which API you would use for *current* Apps? I think you have to define > *current* with applicationIds? If that's true, you have to call > https://hadoop.apache.org/docs/r2.7.0/api/src-html/org/apache/hadoop/yarn/client/api/YarnClient.html#li

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
Then which API you would use for *current* Apps? I think you have to define *current* with applicationIds? If that's true, you have to call https://hadoop.apache.org/docs/r2.7.0/api/src-html/org/apache/hadoop/yarn/client/api/YarnClient.html#line.181 , If I didn't miss anything, there is no API to

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 11:34 AM, Nan Zhu wrote: > Yes, I know there is such an API, what I don't understand is what I should > pass in the filtering API you mentioned, say we query YARN for every 5 > tickets > > 0: Query and get App A is running > > 4: App A is done > > 5: Query...so what I shoul

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
Yes, I know there is such an API, what I don't understand is what I should pass in the filtering API you mentioned, say we query YARN for every 5 tickets 0: Query and get App A is running 4: App A is done 5: Query...so what I should fill as filtering parameters at 5 get capture the changes of Ap

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 11:27 AM, Nan Zhu wrote: > yes, it is going to be Akka if moving forward (at least not going to > introduce an actor framework to livy) -1 on that. I don't see a reason to introduce a large and complex framework like Akka into Livy. What you propose can be achieved easily

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 11:17 AM, Nan Zhu wrote: > Looks like non-REST API also contains this https://hadoop.apache. > org/docs/r2.7.0/api/src-html/org/apache/hadoop/yarn/client/ > api/YarnClient.html#line.225 > > my concern which was skipped in your last email (again) is that, how many > app stat

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
yes, it is going to be Akka if moving forward (at least not going to introduce an actor framework to livy) On Wed, Aug 16, 2017 at 11:24 AM, Meisam Fathi wrote: > That is true, but I was under the impression that this will be implemented > with Akka (maybe because it is mentioned in the design d

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
Hi, Meisam Many thanks for sending the PR my question related to the undergoing discussion is simply "have you seen any performance issue in https://github.com/apache/incubator-livy/pull/36/files#diff-a3f879755cfe10a678cc08ddbe60a4d3R75 ?" We have several scenarios that a large volume of applica

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
That is true, but I was under the impression that this will be implemented with Akka (maybe because it is mentioned in the design doc). On Wed, Aug 16, 2017 at 11:21 AM Marcelo Vanzin wrote: > On Wed, Aug 16, 2017 at 11:16 AM, Meisam Fathi > wrote: > > I do agree that actor based design is clea

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 11:16 AM, Meisam Fathi wrote: > I do agree that actor based design is cleaner and more maintainable. But we > had to discard it because it adds more dependencies to Livy. I've been reading "actor system" as a design pattern, not as introducing a new dependency to Livy. If

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
> The JDK has many implementations of concurrent lists and maps. You don't need to write anything. The code to deal with thread pool vs. the alternative approach would be different, yes, but you make it sound like you'd have to implement some really complicated data structure when that is definitel

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
Here are my two pennies on both designs (actor-based design vs. single-thread polling design) *Single-thread polling design* We implemented a single-thread polling mechanism for Yarn here at PayPal. Our solution is more involved because we added many new features to Livy that we had to consider wh

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
Hello, On Wed, Aug 16, 2017 at 10:35 AM, Arijit Tarafdar wrote: > 1. Additional copy of states in Livy which can be queried from YARN on > request. Not sure I follow. > 2. The design is not event driven and may waste querying YARN unnecessarily > when no actual user/external request is pendin

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 10:31 AM, Nan Zhu wrote: >> In the first case your thread pool is the "shared data structure", in the > second case this map of handles is the "shared data structure", so I don't > understand why you think there is any difference. > > I do not understand why there is no dif

RE: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Arijit Tarafdar
: dev@livy.incubator.apache.org Subject: Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design > I really don't understand what you mean. You need somewhere to keep > the application handles you're monitoring regarding of the solution. The

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
> I really don't understand what you mean. You need somewhere to keep the application handles you're monitoring regarding of the solution. The code making the YARN request needs to somehow update those handles. Whether there's a task per handle that is submitted to a thread pool, or some map or lis

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 9:33 AM, Nan Zhu wrote: >> What I proposed is having a single request to YARN to get all applications' > statuses, if that's possible. You'd still have multiple application handles > that are independent of each other. They'd all be updated separately from > that one thread

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
> What I proposed is having a single request to YARN to get all applications' statuses, if that's possible. You'd still have multiple application handles that are independent of each other. They'd all be updated separately from that one thread talking to YARN. This has nothing to do with a "shared

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 9:06 AM, Nan Zhu wrote: >> I'm not really sure what you're talking about here, since I did not > suggest a "shared data structure", and I'm not really sure what that > means in this context. > > What you claimed is just monitoring/updating the state with a single thread > *

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
> I'm not really sure what you're talking about here, since I did not suggest a "shared data structure", and I'm not really sure what that means in this context. What you claimed is just monitoring/updating the state with a single thread *given* all applications have been there. To implement this

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-15 Thread Marcelo Vanzin
On Tue, Aug 15, 2017 at 2:20 PM, Nan Zhu wrote: > The key design consideration here is that how you model the state of > applications, if in actor, then there will be no synchronization involved > and yielding a cleaner design; if in a shared data structure, you will have > to be careful about coo

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-15 Thread Nan Zhu
PDF version On Tue, Aug 15, 2017 at 2:22 PM, Nan Zhu wrote: > I also attached the discarded version of design here > > Best, > > Nan > > On Tue, Aug 15, 2017 at 2:20 PM, Nan Zhu wrote: > >> Hi, Marcelo, >> >> Yes, essentially it is using multiple threads talking with YARN. >> >> The key design

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-15 Thread Nan Zhu
Hi, Marcelo, Yes, essentially it is using multiple threads talking with YARN. The key design consideration here is that how you model the state of applications, if in actor, then there will be no synchronization involved and yielding a cleaner design; if in a shared data structure, you will have

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-15 Thread Marcelo Vanzin
Hmm, I remember this... it was left as a "todo" item when the app monitoring was added. The document you wrote seems to be a long way of saying you'll have a few threads talking to YARN and updating the state of application handles in Livy. Is that right? I would investigate whether there's any A

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-14 Thread Nan Zhu
t; To: dev@livy.incubator.apache.org > Date: 08/14/2017 02:35 PM > Subject: resolve the scalability problem caused by app monitoring in livy > with an actor-based design > -- > > > > Hi, all > > In HDInsight, we (Microsoft) use Livy as the Spark j

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-14 Thread Alex Bozarth
hu To: dev@livy.incubator.apache.org Date: 08/14/2017 02:35 PM Subject: resolve the scalability problem caused by app monitoring in livy with an actor-based design Hi, all In HDInsight, we (Microsoft) use Livy as the Spark job submission service. We keep seeing the customers fall i

resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-14 Thread Nan Zhu
Hi, all In HDInsight, we (Microsoft) use Livy as the Spark job submission service. We keep seeing the customers fall into the problem when they submit many concurrent applications to the system, or recover livy from a state with many concurrent applications By looking at the code and the customer