Hey guys, Great idea. FYI I created a feature request to Gitlab to render Zeppelin notebooks after the issue will be finalized and you will change to .zpln. https://gitlab.com/gitlab-org/gitlab-ce/issues/50244
Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com> ezt írta (időpont: 2018. aug. 14., K, 10:29): > I agree you’re inviting consistency issues if you maintained a separate > note id-to-note name mapping file. > > > > But I’m still not comfortable with note ids in the name of the notebook > itself. Those names would look ugly if you shared your notebooks on github > for example. You don’t see Jupyter notebooks with names like that. If you > have to keep the note ids with the notebooks could you not simply put the > note id at the top of the notebook as Ruslan suggested? Then you’d only > have to read the first line of each notebook. > > > > Presumably if you copied the notebooks to another Zeppelin server they > would be restored with the same note ids there too? And hopefully there > would be no id clash with notebooks already on that server… > > > > *From:* Jeff Zhang <zjf...@gmail.com> > *Sent:* 14 August 2018 03:49 > *To:* users@zeppelin.apache.org > *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln > instead of [NOTEID]/note.json > > > > > > Thanks for the discussion. > > >>> I'm afraid about non-latin symbols in folder and note name. And what > about hieroglyphs? > > AFAIK, linux allow all the characters to be file name except `\0` and > '/'. I can create file name with Chinese character in linux, I guess you > can use Russian as well. > > > > >>> If I understand correctly, this is being done solely to speed up > loading list of notebooks? What if a list of notebook names, their ids, > folder structure, etc can be *cached* in a separate small json file? Or > perhaps in a small embedded key-value store, like www.mapdb.org would do? > Just thinking out loud. This would require a way to lazily re-sync the > cache. > > > > This not only to speed up the loading but also make the system > architecture easy to maintain. Because for now we have to build the folder > structure of notes in memory, many code in zeppelin is doing this > (Personally I don't think we need any code for this function if we could > get the folder structure from the note file storage system). Use another > storage to keep the mapping of note name and note id will bring another > classic problem of distributed system: consistency. How do we make sure the > consistency between the real note file and this mapping component. If we > create/rename/remove note, we have to both update the notebook repo and the > mapping storage. Any bug in code would bring inconsistency issue based on > my experience. > > > > > > > > > > Ruslan Dautkhanov <dautkha...@gmail.com>于2018年8月14日周二 上午3:58写道: > > Thanks for bringing this up for discussion. My 2 cents below. > > > > I am with Maksim and Felix on concerns with special characters now allowed > in notebook names, and also concerns with different charsets. Russian > language, for example, most commonly use iso-8859-5, koi-8r/u, windows-1251 > charsets etc. This seems like will bring whole new set of localization > issues. > > > > If I understand correctly, this is being done solely to speed up loading > list of notebooks? What if a list of notebook names, their ids, folder > structure, etc can be *cached* in a separate small json file? Or perhaps in > a small embedded key-value store, like www.mapdb.org would do? Just > thinking out loud. This would require a way to lazily re-sync the cache. > > > > Another way to speed up json reads is to somehow force "name" attribute to > be at the top of the json document that's written to disk. Then > re-implement json files reader to read just header of the file and do a > partial json parse ( or in the lack of options, grab "name" attribute from > the json file header by a regex for example). > > > > Back to filenames and charsets, I think issue may be more complicated, if > you store notebooks on a remote filesystem (nfs/ samba etc), and what if > remote server and local nfs client have differences in default fs charsets? > > > > Ideally would be if all filesystems would use UTF-8 for example, but I am > not certain that's a good assumption to make. Also exposing notebook names > can bring some other issues, like I know some users occasionally add > trailing/leading spaces etc. > > > > > > On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich < > m.belou...@tinkoff.ru> wrote: > > The use of Russian and other specific letters in the note name is big > advantage of Zeppelin. I would not like to give up this functionality. > > > > I support the idea about `zpln` file extension. > > The folder structure also sounds good. > > > > I'm afraid about non-latin symbols in folder and note name. And what about > hieroglyphs? > > > > Apache Zeppelin may be the first to use Russian letters in file system in > our company. > > I see a lot of risks to use non-latin symbols and a lot of issues to make > new folder structure stable. > > > > > > > ------------------------------ > > *От:* Jeff Zhang <zjf...@gmail.com> > *Отправлено:* 13 августа 2018 г. 12:50 > *Кому:* users@zeppelin.apache.org > *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of > [NOTEID]/note.json > > > > >>> Do we need the note id in the file name at all? What’s wrong with > just note_name.zpln? > > The reason I keep note id is because currently we use noteId to identify > one note. e.g. we use note id in both websocket api and rest api. It is > almost impossible to remove noteId for the current architecture. If we put > note id into file content of note_name.zpln, then we have to read the note > file every time, then we meet the issues I mentioned above again. > > > > >>> If the file content is json then why not use note_name.json instead of > .zpln? That would make it easier for editors to know how to load/highlight > the file contents. > > I am not strongly biased on *.zpln. But I think one purpose is to help > third parties to identify zeppelin note properly. e.g. github can identify > jupyter notebook (*.ipynb) and render it properly. > > > > >>> Is there any reason for not using *real* folders or directories for > organising the notebooks rather than embedding the folder hierarchy in the > names of the notebooks? If someone wants to ‘move’ the notebooks to > another folder they’d have to manually rename all the files/notebooks at > present. That’s not very user-friendly. > > > > Actually my proposal is to use real folders. What user see in zeppelin > note menu is the actual notes folder structure. If they want to move the > notebooks to another folder, they can change the folder name just like what > user did in file system. > > > > > > > > > > > > Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com>于2018年8月13日周一 下午 > 4:43写道: > > Hi Jeff, > > I have some questions about this proposal (I can’t edit the design doc): > > > > 1. Do we need the note id in the file name at all? What’s wrong with > just note_name.zpln? > 2. If the file content is json then why not use note_name.json instead > of .zpln? That would make it easier for editors to know how to > load/highlight the file contents. > 3. Is there any reason for not using *real* folders or directories for > organising the notebooks rather than embedding the folder hierarchy in the > names of the notebooks? If someone wants to ‘move’ the notebooks to > another folder they’d have to manually rename all the files/notebooks at > present. That’s not very user-friendly. > > > > Thanks, Lucas. > > *From:* Jeff Zhang <zjf...@gmail.com> > *Sent:* 13 August 2018 09:06 > *To:* users@zeppelin.apache.org > *Cc:* dev <d...@zeppelin.apache.org> > *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln > instead of [NOTEID]/note.json > > > > In that case, zeppelin should fail to create note. > > > > Felix Cheung <felixcheun...@hotmail.com>于2018年8月13日周一 下午3:47写道: > > Perhaps one concern is users having characters in note name that are > invalid for file name/file path? > > > > > ------------------------------ > > *From:* Mohit Jaggi <mohitja...@gmail.com> > *Sent:* Sunday, August 12, 2018 6:02 PM > *To:* users@zeppelin.apache.org > *Cc:* dev > *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead > of [NOTEID]/note.json > > > > sounds like a good idea! > > > > On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zjf...@gmail.com> wrote: > > Motivation > > The motivation of ZEPPELIN-2619 is to change the notes storage > structure. Previously we store it using {noteId}/note.json, we’d like to > change it into {note_name}_{note_id}.zpln. There are several reasons for > this change. > > > > 1. {noteId}/note.json is not scalable. We put all notes in one root > folder in flat structure. And when zeppelin server starts, we need to read > all note.json to get the note file name and build the note folder structure > (Because we need to get the note name which is stored in note.json to build > the notebook menu). This would be a nightmare when you have large amounts > of notes. > 2. {noteId}/note.json is not maintainable. It is difficult for a > developer/administrator to find note file based on note name. > 3. {noteId}/note.json has no folder structure. Currently zeppelin have > to build the folder structure internally in memory according note name > which is a big overhead. > > > New Approach > > As I mentioned above, I propose to change the note storage structure to > {note_name}_{note_id}.zpln. note_name could contains folders, e.g. > folder_1/mynote_abcd.zpln > > This kind of note storage structure could bring several benefits. > > 1. We don’t need to load all notes when zeppelin starts. We just need > to list each folder to get the note name and note_id. > 2. It is much maintainable so that it is easy to find the note file > based on note name. > 3. It has the folder structure already. That can be mapped to the note > folder structure. > > > Side Effect > > This approach only works for file system storage, so that means we have to > drop support for MongoNotebookRepo. I think it is ok because I didn’t see > any users talk about this in community, so I assume no one is using it. > > > > This is overall design, welcome any comments and feedback. Thanks. > > > > Here's the google docs, you can also comment it here. > > > https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing > > > > > >