+1 on @rick On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bbuil...@gmail.com> wrote:
> I see in the Enterprise section that multi-tenancy will be included, will > this have user impersonation too? In this way, the user executing will be > the user owning the process. > > On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote: > > +1 > > Hi Tamas, > Pluggable external visualization is really a GREAT feature to have. > I'm looking forward to this :) > > Regards > Shabeel > > On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com> > wrote: > >> Hey, >> >> Really promising roadmap. >> >> I'd only push more visualization options. I agree built in visualization >> is needed with limited charting options but I think we also need somehow >> 'inject' external js visualizations also. >> >> >> For scheduling Zeppelin notebooks we use >> https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> >> through >> the job rest api. It's an enterprise ready and very robust solution >> right now. >> >> >> *Tamas* >> >> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote: >> >>> One point to clarify, I don't want to suggest Oozie in specific, I want >>> to think about which features we develop and which ones we integrate >>> external, preferred Apache, technology? We don't think about building our >>> own storage services so why build our own scheduler? >>> Eran >>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote: >>> >>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick >>>> Now I can see a lot of demands around enterprise level job scheduling. >>>> Either external or built-in, I completely agree having enterprise level job >>>> scheduling support on the roadmap. >>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>, >>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are >>>> related issues i can find in our JIRA. >>>> >>>> @Vinayak >>>> Regarding importing notebook from github, Zeppelin has pluggable >>>> notebook storage layer (see related package >>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>). >>>> So, github notebook sync can be implemented easily. >>>> >>>> @Shabeel >>>> Right, we need better manage management to prevent such OOM. >>>> And i think table is one of the most frequently used way of displaying >>>> data. So definitely, we'll need more features like filter, sort, etc. >>>> After this roadmap discussion, discussion for the next release will >>>> follow. Then we'll get idea when those features will be available. >>>> >>>> @Prasad >>>> Thanks for mentioning HA and DR. They're really important subject for >>>> enterprise use. Definitely Zeppelin will need to address them. >>>> And displaying meta information of notebook on top level page is good >>>> idea. >>>> >>>> It's really great to hear many opinions and ideas. >>>> And thanks @Rick for sharing valuable view to Zeppelin project. >>>> >>>> Thanks, >>>> moon >>>> >>>> >>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> For one, I know that there is rudimentary scheduling built into >>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling >>>>> feature a few months ago). >>>>> But another point is, that Zeppelin should also focus on quality, >>>>> reproduceability and portability. >>>>> Although this doesn't offer exciting new features, it would make >>>>> development much easier. >>>>> >>>>> Cross-platform testability, Tests that pass when run sequentially, >>>>> compatibility with Firefox, and many more open issues that make it so much >>>>> harder to enhance Zeppelin and add features should be addressed soon, >>>>> preferably before more features are added. Already Zeppelin is suffering - >>>>> in my opinion - from quite a lot of feature creep, and we should avoid >>>>> putting in the kitchen sink, at the cost of quality and maintainability. >>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted. >>>>> >>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use >>>>> on many clusters, but it's not getting the love it needs, and I wouldn't >>>>> bet on it, when it comes to integrating scheduling. Instead, any external >>>>> tool should be able to use the REST-API to trigger executions, if you want >>>>> external scheduling. >>>>> >>>>> So, in conclusion, if we take Moon's list as a list of descending >>>>> priorities, I fully agree, under the condition that code quality is >>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos >>>>> SPNEGO SSO support is what we really want) with user and group rights >>>>> assignment on the notebook level. We probably also need Knox-integration >>>>> (ODP-Members looking at integrating Zeppelin should consider contributing >>>>> this), and integration of something like Spree ( >>>>> https://github.com/hammerlab/spree) to be able to profile jobs. >>>>> >>>>> I'm hopeful that soon I can resume contributing some quality-oriented >>>>> code, to drive this "necessary evil" forward ;) >>>>> >>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder < >>>>> sourav.mazumde...@gmail.com> wrote: >>>>> >>>>>> I do agree with Vinayak. It need not be coupled with Oozie. >>>>>> >>>>>> Rather one should be able to call it from any scheduler typically >>>>>> used in enterprise level. May be support for BPML. >>>>>> >>>>>> I believe the existing ability to call/execute a Zeppelin Notebook or >>>>>> a specific paragraph within a notebook using REST API should take care of >>>>>> this requirement to some extent. >>>>>> >>>>>> Regards, >>>>>> Sourav >>>>>> >>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal < >>>>>> vinayakagrawa...@gmail.com> wrote: >>>>>> >>>>>>> @Eran Witkon, >>>>>>> Thanks for the suggestion Eran. I concur with your thought. >>>>>>> If Zepplin can be integrated with oozie, that would be wonderful. >>>>>>> Users will also be able to leverage their Oozie skills. >>>>>>> This would be promising for now. >>>>>>> However, in the future Hadoop might not necessarily be installed in >>>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) >>>>>>> might >>>>>>> not be available. >>>>>>> So perhaps we should give a thought about this feature for the >>>>>>> future. Should it depend on oozie or should Zeppelin have its owns >>>>>>> scheduling? >>>>>>> >>>>>>> As Benjamin has iterated, Databrick notebook has this as a core >>>>>>> notebook feature. >>>>>>> >>>>>>> >>>>>>> Also, would anybody give any suggestions regarding "sync with >>>>>>> github" feature? >>>>>>> -Exporting notebook to Github >>>>>>> -Importing notebook from Github >>>>>>> >>>>>>> Thanks >>>>>>> Vinayak >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect >>>>>>>> zeppelin to existing scheduling tools\workflow tools such as >>>>>>>> https://oozie.apache.org/. this requires betters hooks and status >>>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/ >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal < >>>>>>>> vinayakagrawa...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Moon, >>>>>>>>> The new roadmap looks very promising. I am very happy to see >>>>>>>>> security in the list. >>>>>>>>> I have some suggestions regarding Enterprise Ready features: >>>>>>>>> >>>>>>>>> 1. Job Scheduler - Can this be improved? >>>>>>>>> Currently the scheduler can be used with Cron expression or a >>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one >>>>>>>>> piece >>>>>>>>> of the workflow. Can we look towards the functionality of scheduling >>>>>>>>> notebook's based on other notebooks finishing their job successfully? >>>>>>>>> This requirement would arise in any ETL workflow, where all the >>>>>>>>> downstream users wait for the ETL notebook to finish successfully. >>>>>>>>> Only >>>>>>>>> after that, other business oriented notebooks can be executed. >>>>>>>>> >>>>>>>>> 2. Importing a notebook - Is there a current requirement or future >>>>>>>>> plan to implement a feature that allows import-notebook-from-github? >>>>>>>>> This >>>>>>>>> would allow users to share notebooks seamlessly. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Vinayak >>>>>>>>> >>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Zhong Wang, >>>>>>>>>> Right, Folder support would be quite useful. Thanks for the >>>>>>>>>> opinion. >>>>>>>>>> >>>>>>>>> Hope i can finish the work pr-190 >>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>. >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Sourav, >>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of >>>>>>>>>> run paragraph/query concurrently. Interpreter can implement it's own >>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and >>>>>>>>>> ShellInterpreter >>>>>>>>>> can already run paragraph/query concurrently. >>>>>>>>>> >>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering >>>>>>>>>> nature of scala compiler. That's why user can not run multiple >>>>>>>>>> paragraph >>>>>>>>>> concurrently when they work with SparkInterpreter. >>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will >>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while >>>>>>>>>> they're >>>>>>>>>> in different notebooks. >>>>>>>>>> Thanks for the feedback! >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> moon >>>>>>>>>> >>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang < >>>>>>>>>> wangzhong....@gmail.com> wrote: >>>>>>>>>> >>>>>>>>> Sourav: I think this newly merged PR can help you >>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 >>>>>>>>>>> >>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder < >>>>>>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>> Hi Moon, >>>>>>>>>>>> >>>>>>>>>>>> This looks great. >>>>>>>>>>>> >>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support >>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin. >>>>>>>>>>>> >>>>>>>>>>>> Right now if more than one user tries to run paragraphs in >>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance >>>>>>>>>>>> (and >>>>>>>>>>>> single interpreter instance) the performance is very slow. It is >>>>>>>>>>>> obvious >>>>>>>>>>>> that the queue gets built up within the zeppelin process and >>>>>>>>>>>> interpreter >>>>>>>>>>>> process in that scenario as the time taken to move the status from >>>>>>>>>>>> start to >>>>>>>>>>>> pending and pending to running is very high compared to the actual >>>>>>>>>>>> running >>>>>>>>>>>> time of a paragraph. >>>>>>>>>>>> >>>>>>>>>>>> Without this the multi tenancy support would be meaningless as >>>>>>>>>>>> no one can practically use it in a situation where multiple users >>>>>>>>>>>> are >>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related >>>>>>>>>>>> interpreter). A possible solution would be to spawn separate >>>>>>>>>>>> instance of >>>>>>>>>>>> the same interpreter at every notebook/user level. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Sourav >>>>>>>>>>>> >>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>> Hi Zeppelin users and developers, >>>>>>>>>>>>> >>>>>>>>>>>>> The roadmap we have published at >>>>>>>>>>>>> >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap >>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the >>>>>>>>>>>>> community goes anymore. It's time to update. >>>>>>>>>>>>> >>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks >>>>>>>>>>>>> from users, conferences and meetings, I could summarize the major >>>>>>>>>>>>> interest >>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, >>>>>>>>>>>>> Usability >>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, >>>>>>>>>>>>> Notebook >>>>>>>>>>>>> storage, and Visualization. >>>>>>>>>>>>> >>>>>>>>>>>>> And i could list related subjects under each categories. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> - Enterprise ready >>>>>>>>>>>>> - Authentication >>>>>>>>>>>>> - Shiro authentication ZEPPELIN-548 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-548> >>>>>>>>>>>>> - Authorization >>>>>>>>>>>>> - Notebook authorization PR-681 >>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/681> >>>>>>>>>>>>> - Security >>>>>>>>>>>>> - Multi-tenancy >>>>>>>>>>>>> - Stability >>>>>>>>>>>>> - Usability Improvement >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - UX improvement >>>>>>>>>>>>> - Better Table data support >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Download data as csv, etc PR-725 >>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/725> >>>>>>>>>>>>> , PR-714 >>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/714> >>>>>>>>>>>>> , PR-6 >>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/6> >>>>>>>>>>>>> , PR-89 >>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/89> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Featureful table data display (pagenation, etc) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Pluggability ZEPPELIN-533 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-533> >>>>>>>>>>>>> - Pluggable visualization >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Dynamic Interpreter, notebook, visualization loading >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Repository and registry for pluggable components >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Improve documentation >>>>>>>>>>>>> - Improve contents and readability >>>>>>>>>>>>> - more tutorials, examples >>>>>>>>>>>>> - Interpreter >>>>>>>>>>>>> - Generic JDBC Interpreter >>>>>>>>>>>>> - (spark)R Interpreter >>>>>>>>>>>>> - Cluster manager for interpreter (Proposal >>>>>>>>>>>>> >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal> >>>>>>>>>>>>> ) >>>>>>>>>>>>> - more interpreters >>>>>>>>>>>>> - Notebook storage >>>>>>>>>>>>> - Versioning ZEPPELIN-540 >>>>>>>>>>>>> <http://issues.apache.org/jira/browse/ZEPPELIN-540> >>>>>>>>>>>>> - more notebook storages >>>>>>>>>>>>> - Visualization >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - More visualizations PR-152 >>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/152> >>>>>>>>>>>>> , PR-728 >>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/728> >>>>>>>>>>>>> , PR-336 >>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/336> >>>>>>>>>>>>> , PR-321 >>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/321> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Customize graph (show/hide label, color, etc) >>>>>>>>>>>>> >>>>>>>>>>>>> It will help anyone quickly get overall interest of project >>>>>>>>>>>>> and the direction. And based on this roadmap, we can discuss and >>>>>>>>>>>>> re-define >>>>>>>>>>>>> the next release 0.6.0 scope and it's schedule. >>>>>>>>>>>>> >>>>>>>>>>>>> What do you think? Any feedback would be appreciated. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> moon >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Vinayak Agrawal >>>>>>>>> >>>>>>>>> >>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!" >>>>>>>>> ~Lord Alfred Tennyson >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Vinayak Agrawal >>>>>>> Big Data Analytics >>>>>>> IBM >>>>>>> >>>>>>> "To Strive, To Seek, To Find and Not to Yield!" >>>>>>> ~Lord Alfred Tennyson >>>>>>> >>>>>> >>>>>> >>>>> >> > >