Re: [DISCUSS] Update Roadmap

Shabeel Syed Wed, 02 Mar 2016 00:32:19 -0800

Also we need better rest api support for creating and fetching the
notebooks and paragraphs.
for example if I can set custom defined notebookid and paragraphid , we can
avoid multiple rest api calls.


http://localhost:8080/#/notebook/
<notebookid>/paragraph/<paragraphid>?asIframe
should return me error if notebook or paragraph deos not exists.

and while creating notebook or paragraph I should be able to mention my
custom ids.

Regards
Shabeel

On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wangzhong....@gmail.com> wrote:

> +1 on @rick. quality is really important... I am still encountering bugs
> consistently
>
> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <tejasrivas...@gmail.com>
> wrote:
>
>> +1 on @rick
>>
>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> I see in the Enterprise section that multi-tenancy will be included,
>>> will this have user impersonation too? In this way, the user executing will
>>> be the user owning the process.
>>>
>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote:
>>>
>>> +1
>>>
>>> Hi Tamas,
>>>    Pluggable external visualization is really a GREAT feature to have.
>>> I'm looking forward to this :)
>>>
>>> Regards
>>> Shabeel
>>>
>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com>
>>> wrote:
>>>
>>>> Hey,
>>>>
>>>> Really promising roadmap.
>>>>
>>>> I'd only push more visualization options. I agree built in
>>>> visualization is needed with limited charting options but I think we also
>>>> need somehow 'inject' external js visualizations also.
>>>>
>>>>
>>>> For scheduling Zeppelin notebooks  we use
>>>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> 
>>>> through
>>>> the job rest api. It's an enterprise ready and very robust solution
>>>> right now.
>>>>
>>>>
>>>> *Tamas*
>>>>
>>>> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote:
>>>>
>>>>> One point to clarify, I don't want to suggest Oozie in specific, I
>>>>> want to think about which features we develop and which ones we integrate
>>>>> external, preferred Apache, technology? We don't think about building our
>>>>> own storage services so why build our own scheduler?
>>>>> Eran
>>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote:
>>>>>
>>>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>>> Now I can see a lot of demands around enterprise level job
>>>>>> scheduling. Either external or built-in, I completely agree having
>>>>>> enterprise level job scheduling support on the roadmap.
>>>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>>>> related issues i can find in our JIRA.
>>>>>>
>>>>>> @Vinayak
>>>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>>>> notebook storage layer (see related package
>>>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>>>> So, github notebook sync can be implemented easily.
>>>>>>
>>>>>> @Shabeel
>>>>>> Right, we need better manage management to prevent such OOM.
>>>>>> And i think table is one of the most frequently used way of
>>>>>> displaying data. So definitely, we'll need more features like filter, 
>>>>>> sort,
>>>>>> etc.
>>>>>> After this roadmap discussion, discussion for the next release will
>>>>>> follow. Then we'll get idea when those features will be available.
>>>>>>
>>>>>> @Prasad
>>>>>> Thanks for mentioning HA and DR. They're really important subject for
>>>>>> enterprise use. Definitely Zeppelin will need to address them.
>>>>>> And displaying meta information of notebook on top level page is good
>>>>>> idea.
>>>>>>
>>>>>> It's really great to hear many opinions and ideas.
>>>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> For one, I know that there is rudimentary scheduling built into
>>>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>>>> feature a few months ago).
>>>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>>>> reproduceability and portability.
>>>>>>> Although this doesn't offer exciting new features, it would make
>>>>>>> development much easier.
>>>>>>>
>>>>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>>>>> compatibility with Firefox, and many more open issues that make it so 
>>>>>>> much
>>>>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>>>>> preferably before more features are added. Already Zeppelin is 
>>>>>>> suffering -
>>>>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>>>>
>>>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in
>>>>>>> use on many clusters, but it's not getting the love it needs, and I
>>>>>>> wouldn't bet on it, when it comes to integrating scheduling. Instead, 
>>>>>>> any
>>>>>>> external tool should be able to use the REST-API to trigger executions, 
>>>>>>> if
>>>>>>> you want external scheduling.
>>>>>>>
>>>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>>>> priorities, I fully agree, under the condition that code quality is
>>>>>>> included as a subset of enterprise-readyness. Auth* is paramount 
>>>>>>> (Kerberos
>>>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>>>> (ODP-Members looking at integrating Zeppelin should consider 
>>>>>>> contributing
>>>>>>> this), and integration of something like Spree (
>>>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>>>
>>>>>>> I'm hopeful that soon I can resume contributing some
>>>>>>> quality-oriented code, to drive this "necessary evil" forward ;)
>>>>>>>
>>>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>>>> sourav.mazumde...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>>>
>>>>>>>> Rather one should be able to call it from any scheduler typically
>>>>>>>> used in enterprise level. May be support for BPML.
>>>>>>>>
>>>>>>>> I believe the existing ability to call/execute a Zeppelin Notebook
>>>>>>>> or a specific paragraph within a notebook using REST API should take 
>>>>>>>> care
>>>>>>>> of this requirement to some extent.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Sourav
>>>>>>>>
>>>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>>>>> vinayakagrawa...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> @Eran Witkon,
>>>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>>>>>> Users will also be able to leverage their Oozie skills.
>>>>>>>>> This would be promising for now.
>>>>>>>>> However, in the future Hadoop might not necessarily be installed
>>>>>>>>> in Spark Cluster and Oozie (since its installs with Hadoop 
>>>>>>>>> Distribution)
>>>>>>>>> might not be available.
>>>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>>>> scheduling?
>>>>>>>>>
>>>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>>>> notebook feature.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>>>> github" feature?
>>>>>>>>> -Exporting notebook to Github
>>>>>>>>> -Importing notebook from Github
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Vinayak
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>>>> https://oozie.apache.org/. this requires betters hooks and
>>>>>>>>>> status reporting but doesn't make zeppeling and ETL\scheduler tool by
>>>>>>>>>> itself/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>>>>> vinayakagrawa...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Moon,
>>>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>>>> security in the list.
>>>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>>>
>>>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be 
>>>>>>>>>>> one piece
>>>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>>>> notebook's based on other notebooks finishing their job 
>>>>>>>>>>> successfully?
>>>>>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>>>>>> downstream users wait for the ETL notebook to finish successfully. 
>>>>>>>>>>> Only
>>>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>>>
>>>>>>>>>>> 2. Importing a notebook - Is there a current requirement or
>>>>>>>>>>> future plan to implement a feature that allows 
>>>>>>>>>>> import-notebook-from-github?
>>>>>>>>>>> This would allow users to share notebooks seamlessly.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Vinayak
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Zhong Wang,
>>>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>>>> opinion.
>>>>>>>>>>>>
>>>>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Sourav,
>>>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation
>>>>>>>>>>>> of run paragraph/query concurrently. Interpreter can implement 
>>>>>>>>>>>> it's own
>>>>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and 
>>>>>>>>>>>> ShellInterpreter
>>>>>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>>>>>
>>>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>>>>>>> nature of scala compiler. That's why user can not run multiple 
>>>>>>>>>>>> paragraph
>>>>>>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will
>>>>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while 
>>>>>>>>>>>> they're
>>>>>>>>>>>> in different notebooks.
>>>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> moon
>>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <
>>>>>>>>>>>> wangzhong....@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>>>>> sourav.mazumde...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support
>>>>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin 
>>>>>>>>>>>>>> instance (and
>>>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is 
>>>>>>>>>>>>>> obvious
>>>>>>>>>>>>>> that the queue gets built up within the zeppelin process and 
>>>>>>>>>>>>>> interpreter
>>>>>>>>>>>>>> process in that scenario as the time taken to move the status 
>>>>>>>>>>>>>> from start to
>>>>>>>>>>>>>> pending and pending to running is very high compared to the 
>>>>>>>>>>>>>> actual running
>>>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Without this the multi tenancy support would be meaningless
>>>>>>>>>>>>>> as no one can practically use it in a situation where multiple 
>>>>>>>>>>>>>> users are
>>>>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the 
>>>>>>>>>>>>>> related
>>>>>>>>>>>>>> interpreter). A possible solution would be to spawn separate 
>>>>>>>>>>>>>> instance of
>>>>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Sourav
>>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <
>>>>>>>>>>>>>> m...@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks
>>>>>>>>>>>>>>> from users, conferences and meetings, I could summarize the 
>>>>>>>>>>>>>>> major interest
>>>>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, 
>>>>>>>>>>>>>>> Usability
>>>>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, 
>>>>>>>>>>>>>>> Notebook
>>>>>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>>>>       - Security
>>>>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>>>>       
>>>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>>>>       )
>>>>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>
>>>>>>>>>>>>>>>       , PR-728
>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>>>>>>>>>       , PR-336
>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>
>>>>>>>>>>>>>>>       , PR-321
>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It will help anyone quickly get overall interest of project
>>>>>>>>>>>>>>> and the direction. And based on this roadmap, we can discuss 
>>>>>>>>>>>>>>> and re-define
>>>>>>>>>>>>>>> the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Vinayak Agrawal
>>>>>>>>> Big Data Analytics
>>>>>>>>> IBM
>>>>>>>>>
>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>
>>>
>

Re: [DISCUSS] Update Roadmap

Reply via email to