Re: [DISCUSS] Update Roadmap

TEJA SRIVASTAV Tue, 01 Mar 2016 10:23:19 -0800

+1 on @rick

On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bbuil...@gmail.com> wrote:


> I see in the Enterprise section that multi-tenancy will be included, will
> this have user impersonation too? In this way, the user executing will be
> the user owning the process.
>
> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote:
>
> +1
>
> Hi Tamas,
>    Pluggable external visualization is really a GREAT feature to have.
> I'm looking forward to this :)
>
> Regards
> Shabeel
>
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com>
> wrote:
>
>> Hey,
>>
>> Really promising roadmap.
>>
>> I'd only push more visualization options. I agree built in visualization
>> is needed with limited charting options but I think we also need somehow
>> 'inject' external js visualizations also.
>>
>>
>> For scheduling Zeppelin notebooks  we use
>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> 
>> through
>> the job rest api. It's an enterprise ready and very robust solution
>> right now.
>>
>>
>> *Tamas*
>>
>> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote:
>>
>>> One point to clarify, I don't want to suggest Oozie in specific, I want
>>> to think about which features we develop and which ones we integrate
>>> external, preferred Apache, technology? We don't think about building our
>>> own storage services so why build our own scheduler?
>>> Eran
>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote:
>>>
>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>> Now I can see a lot of demands around enterprise level job scheduling.
>>>> Either external or built-in, I completely agree having enterprise level job
>>>> scheduling support on the roadmap.
>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>> related issues i can find in our JIRA.
>>>>
>>>> @Vinayak
>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>> notebook storage layer (see related package
>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>> So, github notebook sync can be implemented easily.
>>>>
>>>> @Shabeel
>>>> Right, we need better manage management to prevent such OOM.
>>>> And i think table is one of the most frequently used way of displaying
>>>> data. So definitely, we'll need more features like filter, sort, etc.
>>>> After this roadmap discussion, discussion for the next release will
>>>> follow. Then we'll get idea when those features will be available.
>>>>
>>>> @Prasad
>>>> Thanks for mentioning HA and DR. They're really important subject for
>>>> enterprise use. Definitely Zeppelin will need to address them.
>>>> And displaying meta information of notebook on top level page is good
>>>> idea.
>>>>
>>>> It's really great to hear many opinions and ideas.
>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> For one, I know that there is rudimentary scheduling built into
>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>> feature a few months ago).
>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>> reproduceability and portability.
>>>>> Although this doesn't offer exciting new features, it would make
>>>>> development much easier.
>>>>>
>>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>>> compatibility with Firefox, and many more open issues that make it so much
>>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>>
>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>>>>> on many clusters, but it's not getting the love it needs, and I wouldn't
>>>>> bet on it, when it comes to integrating scheduling. Instead, any external
>>>>> tool should be able to use the REST-API to trigger executions, if you want
>>>>> external scheduling.
>>>>>
>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>> priorities, I fully agree, under the condition that code quality is
>>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>>> this), and integration of something like Spree (
>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>
>>>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>>>> code, to drive this "necessary evil" forward ;)
>>>>>
>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>> sourav.mazumde...@gmail.com> wrote:
>>>>>
>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>
>>>>>> Rather one should be able to call it from any scheduler typically
>>>>>> used in enterprise level. May be support for BPML.
>>>>>>
>>>>>> I believe the existing ability to call/execute a Zeppelin Notebook or
>>>>>> a specific paragraph within a notebook using REST API should take care of
>>>>>> this requirement to some extent.
>>>>>>
>>>>>> Regards,
>>>>>> Sourav
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>>> vinayakagrawa...@gmail.com> wrote:
>>>>>>
>>>>>>> @Eran Witkon,
>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>>>> Users will also be able to leverage their Oozie skills.
>>>>>>> This would be promising for now.
>>>>>>> However, in the future Hadoop might not necessarily be installed in
>>>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) 
>>>>>>> might
>>>>>>> not be available.
>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>> scheduling?
>>>>>>>
>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>> notebook feature.
>>>>>>>
>>>>>>>
>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>> github" feature?
>>>>>>> -Exporting notebook to Github
>>>>>>> -Importing notebook from Github
>>>>>>>
>>>>>>> Thanks
>>>>>>> Vinayak
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>>> vinayakagrawa...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Moon,
>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>> security in the list.
>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>
>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one 
>>>>>>>>> piece
>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>>>> downstream users wait for the ETL notebook to finish successfully. 
>>>>>>>>> Only
>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>
>>>>>>>>> 2. Importing a notebook - Is there a current requirement or future
>>>>>>>>> plan to implement a feature that allows import-notebook-from-github? 
>>>>>>>>> This
>>>>>>>>> would allow users to share notebooks seamlessly.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Vinayak
>>>>>>>>>
>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Zhong Wang,
>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>> opinion.
>>>>>>>>>>
>>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Sourav,
>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of
>>>>>>>>>> run paragraph/query concurrently. Interpreter can implement it's own
>>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and 
>>>>>>>>>> ShellInterpreter
>>>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>>>
>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>>>>> nature of scala compiler. That's why user can not run multiple 
>>>>>>>>>> paragraph
>>>>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will
>>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while 
>>>>>>>>>> they're
>>>>>>>>>> in different notebooks.
>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> moon
>>>>>>>>>>
>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <
>>>>>>>>>> wangzhong....@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>>> sourav.mazumde...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>
>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>
>>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support
>>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>
>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance 
>>>>>>>>>>>> (and
>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is 
>>>>>>>>>>>> obvious
>>>>>>>>>>>> that the queue gets built up within the zeppelin process and 
>>>>>>>>>>>> interpreter
>>>>>>>>>>>> process in that scenario as the time taken to move the status from 
>>>>>>>>>>>> start to
>>>>>>>>>>>> pending and pending to running is very high compared to the actual 
>>>>>>>>>>>> running
>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>
>>>>>>>>>>>> Without this the multi tenancy support would be meaningless as
>>>>>>>>>>>> no one can practically use it in a situation where multiple users 
>>>>>>>>>>>> are
>>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related
>>>>>>>>>>>> interpreter). A possible solution would be to spawn separate 
>>>>>>>>>>>> instance of
>>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Sourav
>>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks
>>>>>>>>>>>>> from users, conferences and meetings, I could summarize the major 
>>>>>>>>>>>>> interest
>>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, 
>>>>>>>>>>>>> Usability
>>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, 
>>>>>>>>>>>>> Notebook
>>>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>>>
>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>>       - Security
>>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>>       
>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>>       )
>>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>
>>>>>>>>>>>>>       , PR-728
>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>>>>>>>       , PR-336
>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>
>>>>>>>>>>>>>       , PR-321
>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>
>>>>>>>>>>>>> It will help anyone quickly get overall interest of project
>>>>>>>>>>>>> and the direction. And based on this roadmap, we can discuss and 
>>>>>>>>>>>>> re-define
>>>>>>>>>>>>> the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> moon
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Vinayak Agrawal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Vinayak Agrawal
>>>>>>> Big Data Analytics
>>>>>>> IBM
>>>>>>>
>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>> ~Lord Alfred Tennyson
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>
>
>

Re: [DISCUSS] Update Roadmap

Reply via email to