Hey guys,

Great idea.
FYI I created a feature request to Gitlab to render Zeppelin notebooks
after the issue will be finalized and you will change to .zpln.
https://gitlab.com/gitlab-org/gitlab-ce/issues/50244

Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com> ezt írta (időpont:
2018. aug. 14., K, 10:29):

> I agree you’re inviting consistency issues if you maintained a separate
> note id-to-note name mapping file.
>
>
>
> But I’m still not comfortable with note ids in the name of the notebook
> itself.  Those names would look ugly if you shared your notebooks on github
> for example.  You don’t see Jupyter notebooks with names like that.  If you
> have to keep the note ids with the notebooks could you not simply put the
> note id at the top of the notebook as Ruslan suggested? Then you’d only
> have to read the first line of each notebook.
>
>
>
> Presumably if you copied the notebooks to another Zeppelin server they
> would be restored with the same note ids there too? And hopefully there
> would be no id clash with notebooks already on that server…
>
>
>
> *From:* Jeff Zhang <zjf...@gmail.com>
> *Sent:* 14 August 2018 03:49
> *To:* users@zeppelin.apache.org
> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> instead of [NOTEID]/note.json
>
>
>
>
>
> Thanks for the discussion.
>
> >>> I'm afraid about non-latin symbols in folder and note name. And what
> about hieroglyphs?
>
> AFAIK, linux allow all the characters to be file name except `\0` and
> '/'.  I can create file name with Chinese character in linux, I guess you
> can use Russian as well.
>
>
>
> >>> If I understand correctly, this is being done solely to speed up
> loading list of notebooks? What if a list of notebook names, their ids,
> folder structure, etc can be *cached* in a separate small json file? Or
> perhaps in a small embedded key-value store, like www.mapdb.org would do?
> Just thinking out loud. This would require a way to lazily re-sync the
> cache.
>
>
>
> This not only to speed up the loading but also make the system
> architecture easy to maintain. Because for now we have to build the folder
> structure of notes in memory, many code in zeppelin is doing this
> (Personally I don't think we need any code for this function if we could
> get the folder structure from the note file storage system). Use another
> storage to keep the mapping of note name and note id will bring another
> classic problem of distributed system: consistency. How do we make sure the
> consistency between the real note file and this mapping component. If we
> create/rename/remove note, we have to both update the notebook repo and the
> mapping storage. Any bug in code would bring inconsistency issue based on
> my experience.
>
>
>
>
>
>
>
>
>
> Ruslan Dautkhanov <dautkha...@gmail.com>于2018年8月14日周二 上午3:58写道:
>
> Thanks for bringing this up for discussion. My 2 cents below.
>
>
>
> I am with Maksim and Felix on concerns with special characters now allowed
> in notebook names, and also concerns with different charsets. Russian
> language, for example, most commonly use iso-8859-5, koi-8r/u, windows-1251
> charsets etc. This seems like will bring whole new set of localization
> issues.
>
>
>
> If I understand correctly, this is being done solely to speed up loading
> list of notebooks? What if a list of notebook names, their ids, folder
> structure, etc can be *cached* in a separate small json file? Or perhaps in
> a small embedded key-value store, like www.mapdb.org would do? Just
> thinking out loud. This would require a way to lazily re-sync the cache.
>
>
>
> Another way to speed up json reads is to somehow force "name" attribute to
> be at the top of the json document that's written to disk. Then
> re-implement json files reader to read just header of the file and do a
> partial json parse ( or in the lack of options, grab "name" attribute from
> the json file header by a regex for example).
>
>
>
> Back to filenames and charsets, I think issue may be more complicated, if
> you store notebooks on a remote filesystem (nfs/ samba etc), and what if
> remote server and local nfs client have differences in default fs charsets?
>
>
>
> Ideally would be if all filesystems would use UTF-8 for example, but I am
> not certain that's a good assumption to make. Also exposing notebook names
> can bring some other issues, like I know some users occasionally add
> trailing/leading spaces etc.
>
>
>
>
>
> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> m.belou...@tinkoff.ru> wrote:
>
> The use of Russian and other specific letters in the note name is big
> advantage of Zeppelin. I would not like to give up this functionality.
>
>
>
> I support the idea about `zpln` file extension.
>
> The folder structure also sounds good.
>
>
>
> I'm afraid about non-latin symbols in folder and note name. And what about
> hieroglyphs?
>
>
>
> Apache Zeppelin may be the first to use Russian letters in file system in
> our company.
>
> I see a lot of risks to use non-latin symbols and a lot of issues to make
> new folder structure stable.
>
>
>
>
>
>
> ------------------------------
>
> *От:* Jeff Zhang <zjf...@gmail.com>
> *Отправлено:* 13 августа 2018 г. 12:50
> *Кому:* users@zeppelin.apache.org
> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of
> [NOTEID]/note.json
>
>
>
> >>> Do we need the note id in the file name at all? What’s wrong with
> just note_name.zpln?
>
> The reason I keep note id is because currently we use noteId to identify
> one note. e.g. we use note id in both websocket api and rest api. It is
> almost impossible to remove noteId for the current architecture. If we put
> note id into file content of note_name.zpln, then we have to read the note
> file every time, then we meet the issues I mentioned above again.
>
>
>
> >>> If the file content is json then why not use note_name.json instead of
> .zpln? That would make it easier for editors to know how to load/highlight
> the file contents.
>
> I am not strongly biased on *.zpln. But I think one purpose is to help
> third parties to identify zeppelin note properly. e.g. github can identify
> jupyter notebook (*.ipynb) and render it properly.
>
>
>
> >>> Is there any reason for not using *real* folders or directories for
> organising the notebooks rather than embedding the folder hierarchy in the
> names of the notebooks?  If someone wants to ‘move’ the notebooks to
> another folder they’d have to manually rename all the files/notebooks at
> present.  That’s not very user-friendly.
>
>
>
> Actually my proposal is to use real folders. What user see in zeppelin
> note menu is the actual notes folder structure. If they want to move the
> notebooks to another folder, they can change the folder name just like what
> user did in file system.
>
>
>
>
>
>
>
>
>
>
>
> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com>于2018年8月13日周一 下午
> 4:43写道:
>
> Hi Jeff,
>
> I have some questions about this proposal (I can’t edit the design doc):
>
>
>
>    1. Do we need the note id in the file name at all? What’s wrong with
>    just note_name.zpln?
>    2. If the file content is json then why not use note_name.json instead
>    of .zpln? That would make it easier for editors to know how to
>    load/highlight the file contents.
>    3. Is there any reason for not using *real* folders or directories for
>    organising the notebooks rather than embedding the folder hierarchy in the
>    names of the notebooks?  If someone wants to ‘move’ the notebooks to
>    another folder they’d have to manually rename all the files/notebooks at
>    present.  That’s not very user-friendly.
>
>
>
> Thanks, Lucas.
>
> *From:* Jeff Zhang <zjf...@gmail.com>
> *Sent:* 13 August 2018 09:06
> *To:* users@zeppelin.apache.org
> *Cc:* dev <d...@zeppelin.apache.org>
> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> instead of [NOTEID]/note.json
>
>
>
> In that case, zeppelin should fail to create note.
>
>
>
> Felix Cheung <felixcheun...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>
> Perhaps one concern is users having characters in note name that are
> invalid for file name/file path?
>
>
>
>
> ------------------------------
>
> *From:* Mohit Jaggi <mohitja...@gmail.com>
> *Sent:* Sunday, August 12, 2018 6:02 PM
> *To:* users@zeppelin.apache.org
> *Cc:* dev
> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
> of [NOTEID]/note.json
>
>
>
> sounds like a good idea!
>
>
>
> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zjf...@gmail.com> wrote:
>
> Motivation
>
>    The motivation of ZEPPELIN-2619 is to change the notes storage
> structure. Previously we store it using {noteId}/note.json, we’d like to
> change it into {note_name}_{note_id}.zpln. There are several reasons for
> this change.
>
>
>
>    1. {noteId}/note.json is not scalable. We put all notes in one root
>    folder in flat structure. And when zeppelin server starts, we need to read
>    all note.json to get the note file name and build the note folder structure
>    (Because we need to get the note name which is stored in note.json to build
>    the notebook menu). This would be a nightmare when you have large amounts
>    of notes.
>    2. {noteId}/note.json is not maintainable. It is difficult for a
>    developer/administrator to find note file based on note name.
>    3. {noteId}/note.json has no folder structure. Currently zeppelin have
>    to build the folder structure internally in memory according note name
>    which is a big overhead.
>
>
> New Approach
>
>    As I mentioned above, I propose to change the note storage structure to
> {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
> folder_1/mynote_abcd.zpln
>
> This kind of note storage structure could bring several benefits.
>
>    1. We don’t need to load all notes when zeppelin starts. We just need
>    to list each folder to get the note name and note_id.
>    2. It is much maintainable so that it is easy to find the note file
>    based on note name.
>    3. It has the folder structure already. That can be mapped to the note
>    folder structure.
>
>
> Side Effect
>
> This approach only works for file system storage, so that means we have to
> drop support for MongoNotebookRepo. I think it is ok because I didn’t see
> any users talk about this in community, so I assume no one is using it.
>
>
>
> This is overall design, welcome any comments and feedback. Thanks.
>
>
>
> Here's the google docs, you can also comment it here.
>
>
> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>
>
>
>
>
>

Reply via email to