another reason for keeping noteId is uniqueness in case of multi-user
environments. In that case users have separate zeppelin workspaces, which is
something we are using in production: see ZEPPELIN_NOTEBOOK_PUBLIC=false in the
doc [1]. In that case users might be very confused when they can not create
notebooks with a name that already exists, but they most likely don't see (yet).
So I like the proposal {note_name}_{note_id}.zpln. where note_name could
contains folders, e.g. folder_1/mynote_abcd.zpln. Even though I like
{note_name}.{note_id}.zpln (dot in between note_name and note_id) even better
:-)
Regards
Andreas
[1]
http://zeppelin.apache.org/docs/0.8.0/setup/security/notebook_authorization.html#separate-notebook-workspaces-public-vs-private
On 2018/08/18 08:42:44, Jeff Zhang <[email protected]> wrote:
> BTW, I also prefer to use note name as identify of note if the issue I
> mentioned before is acceptable for most of users.
>
>
>
> Jeff Zhang <[email protected]>于2018年8月18日周六 下午4:40写道:
>
> >
> > I am afraid we can not remove noteId, as noteId is the unique identifier
> > of note and is immutable which is used in a lot places, such as paragraph
> > share and rest api.
> > If we use note name as note id then it may break user's app if note name
> > is changed
> >
> >
> > Jongyoul Lee <[email protected]>于2018年8月18日周六 下午2:33写道:
> >
> >> Hi, thanks for this kind of discussion.
> >>
> >> About noteId, How about changing note id to note name? AFAIK, Note id is
> >> just an identifier and we can set any value to it.
> >>
> >> There’re two potential problems. We should be more careful to handle note
> >> id as it could have very various type of characters. And Second, in case
> >> where someone changes a note name, those who are seeing and updating the
> >> same note wouldn’t access that note. We could handle it by using
> >> websockets.
> >>
> >> WDYT?
> >>
> >> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <[email protected]> wrote:
> >>
> >>> >>> But I’m still not comfortable with note ids in the name of the
> >>> notebook itself. Those names would look ugly if you shared your notebooks
> >>> on github for example. You don’t see Jupyter notebooks with names like
> >>> that. If you have to keep the note ids with the notebooks could you not
> >>> simply put the note id at the top of the notebook as Ruslan suggested?
> >>> Then
> >>> you’d only have to read the first line of each notebook.
> >>>
> >>> I know putting note_id in the note file name is not so elegant, but this
> >>> is what we have to compromise to keep compatibility as we use noteId to
> >>> uniquely identify note right now. And I don't think putting noteId in the
> >>> top first line of note would help much. We still have to read note files
> >>> which take much more time than just read the file names via file system.
> >>>
> >>> Regarding the readability of note file name, I think it won't affect
> >>> much. E.g. This is the note book file name like: *My Project/My Spark
> >>> Tutorial Note_2A94M5J1Z.zpln*
> >>> What user see in notebook menu is still *My Project/My Spark Tutorial*
> >>> *Note
> >>> *which is no difference from what we see now.
> >>>
> >>> And thanks again for the feedback and comments, I am so glad to see so
> >>> many discussion in community.
> >>>
> >>>
> >>>
> >>> Partridge, Lucas (GE Aviation) <[email protected]>于2018年8月14日周二
> >>> 下午4:29写道:
> >>>
> >>>> I agree you’re inviting consistency issues if you maintained a separate
> >>>> note id-to-note name mapping file.
> >>>>
> >>>>
> >>>>
> >>>> But I’m still not comfortable with note ids in the name of the notebook
> >>>> itself. Those names would look ugly if you shared your notebooks on
> >>>> github
> >>>> for example. You don’t see Jupyter notebooks with names like that. If
> >>>> you
> >>>> have to keep the note ids with the notebooks could you not simply put the
> >>>> note id at the top of the notebook as Ruslan suggested? Then you’d only
> >>>> have to read the first line of each notebook.
> >>>>
> >>>>
> >>>>
> >>>> Presumably if you copied the notebooks to another Zeppelin server they
> >>>> would be restored with the same note ids there too? And hopefully there
> >>>> would be no id clash with notebooks already on that server…
> >>>>
> >>>>
> >>>>
> >>>> *From:* Jeff Zhang <[email protected]>
> >>>> *Sent:* 14 August 2018 03:49
> >>>> *To:* [email protected]
> >>>>
> >>>>
> >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> >>>> instead of [NOTEID]/note.json
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Thanks for the discussion.
> >>>>
> >>>> >>> I'm afraid about non-latin symbols in folder and note name. And
> >>>> what about hieroglyphs?
> >>>>
> >>>> AFAIK, linux allow all the characters to be file name except `\0` and
> >>>> '/'. I can create file name with Chinese character in linux, I guess you
> >>>> can use Russian as well.
> >>>>
> >>>>
> >>>>
> >>>> >>> If I understand correctly, this is being done solely to speed up
> >>>> loading list of notebooks? What if a list of notebook names, their ids,
> >>>> folder structure, etc can be *cached* in a separate small json file? Or
> >>>> perhaps in a small embedded key-value store, like www.mapdb.org would
> >>>> do? Just thinking out loud. This would require a way to lazily re-sync
> >>>> the
> >>>> cache.
> >>>>
> >>>>
> >>>>
> >>>> This not only to speed up the loading but also make the system
> >>>> architecture easy to maintain. Because for now we have to build the
> >>>> folder
> >>>> structure of notes in memory, many code in zeppelin is doing this
> >>>> (Personally I don't think we need any code for this function if we could
> >>>> get the folder structure from the note file storage system). Use another
> >>>> storage to keep the mapping of note name and note id will bring another
> >>>> classic problem of distributed system: consistency. How do we make sure
> >>>> the
> >>>> consistency between the real note file and this mapping component. If we
> >>>> create/rename/remove note, we have to both update the notebook repo and
> >>>> the
> >>>> mapping storage. Any bug in code would bring inconsistency issue based on
> >>>> my experience.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Ruslan Dautkhanov <[email protected]>于2018年8月14日周二 上午3:58写道:
> >>>>
> >>>> Thanks for bringing this up for discussion. My 2 cents below.
> >>>>
> >>>>
> >>>>
> >>>> I am with Maksim and Felix on concerns with special characters now
> >>>> allowed in notebook names, and also concerns with different charsets.
> >>>> Russian language, for example, most commonly use iso-8859-5, koi-8r/u,
> >>>> windows-1251 charsets etc. This seems like will bring whole new set of
> >>>> localization issues.
> >>>>
> >>>>
> >>>>
> >>>> If I understand correctly, this is being done solely to speed up
> >>>> loading list of notebooks? What if a list of notebook names, their ids,
> >>>> folder structure, etc can be *cached* in a separate small json file? Or
> >>>> perhaps in a small embedded key-value store, like www.mapdb.org would
> >>>> do? Just thinking out loud. This would require a way to lazily re-sync
> >>>> the
> >>>> cache.
> >>>>
> >>>>
> >>>>
> >>>> Another way to speed up json reads is to somehow force "name" attribute
> >>>> to be at the top of the json document that's written to disk. Then
> >>>> re-implement json files reader to read just header of the file and do a
> >>>> partial json parse ( or in the lack of options, grab "name" attribute
> >>>> from
> >>>> the json file header by a regex for example).
> >>>>
> >>>>
> >>>>
> >>>> Back to filenames and charsets, I think issue may be more complicated,
> >>>> if you store notebooks on a remote filesystem (nfs/ samba etc), and what
> >>>> if
> >>>> remote server and local nfs client have differences in default fs
> >>>> charsets?
> >>>>
> >>>>
> >>>>
> >>>> Ideally would be if all filesystems would use UTF-8 for example, but I
> >>>> am not certain that's a good assumption to make. Also exposing notebook
> >>>> names can bring some other issues, like I know some users occasionally
> >>>> add
> >>>> trailing/leading spaces etc.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> >>>> [email protected]> wrote:
> >>>>
> >>>> The use of Russian and other specific letters in the note name is big
> >>>> advantage of Zeppelin. I would not like to give up this functionality.
> >>>>
> >>>>
> >>>>
> >>>> I support the idea about `zpln` file extension.
> >>>>
> >>>> The folder structure also sounds good.
> >>>>
> >>>>
> >>>>
> >>>> I'm afraid about non-latin symbols in folder and note name. And what
> >>>> about hieroglyphs?
> >>>>
> >>>>
> >>>>
> >>>> Apache Zeppelin may be the first to use Russian letters in file system
> >>>> in our company.
> >>>>
> >>>> I see a lot of risks to use non-latin symbols and a lot of issues to
> >>>> make new folder structure stable.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> *От:* Jeff Zhang <[email protected]>
> >>>> *Отправлено:* 13 августа 2018 г. 12:50
> >>>> *Кому:* [email protected]
> >>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
> >>>> of [NOTEID]/note.json
> >>>>
> >>>>
> >>>>
> >>>> >>> Do we need the note id in the file name at all? What’s wrong with
> >>>> just note_name.zpln?
> >>>>
> >>>> The reason I keep note id is because currently we use noteId to
> >>>> identify one note. e.g. we use note id in both websocket api and rest
> >>>> api.
> >>>> It is almost impossible to remove noteId for the current architecture. If
> >>>> we put note id into file content of note_name.zpln, then we have to read
> >>>> the note file every time, then we meet the issues I mentioned above
> >>>> again.
> >>>>
> >>>>
> >>>>
> >>>> >>> If the file content is json then why not use note_name.json instead
> >>>> of .zpln? That would make it easier for editors to know how to
> >>>> load/highlight the file contents.
> >>>>
> >>>> I am not strongly biased on *.zpln. But I think one purpose is to help
> >>>> third parties to identify zeppelin note properly. e.g. github can
> >>>> identify
> >>>> jupyter notebook (*.ipynb) and render it properly.
> >>>>
> >>>>
> >>>>
> >>>> >>> Is there any reason for not using *real* folders or directories
> >>>> for organising the notebooks rather than embedding the folder hierarchy
> >>>> in
> >>>> the names of the notebooks? If someone wants to ‘move’ the notebooks to
> >>>> another folder they’d have to manually rename all the files/notebooks at
> >>>> present. That’s not very user-friendly.
> >>>>
> >>>>
> >>>>
> >>>> Actually my proposal is to use real folders. What user see in zeppelin
> >>>> note menu is the actual notes folder structure. If they want to move the
> >>>> notebooks to another folder, they can change the folder name just like
> >>>> what
> >>>> user did in file system.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Partridge, Lucas (GE Aviation) <[email protected]>于2018年8月13日周一 下午
> >>>> 4:43写道:
> >>>>
> >>>> Hi Jeff,
> >>>>
> >>>> I have some questions about this proposal (I can’t edit the design doc):
> >>>>
> >>>>
> >>>>
> >>>> 1. Do we need the note id in the file name at all? What’s wrong
> >>>> with just note_name.zpln?
> >>>> 2. If the file content is json then why not use note_name.json
> >>>> instead of .zpln? That would make it easier for editors to know how to
> >>>> load/highlight the file contents.
> >>>> 3. Is there any reason for not using *real* folders or directories
> >>>> for organising the notebooks rather than embedding the folder
> >>>> hierarchy in
> >>>> the names of the notebooks? If someone wants to ‘move’ the notebooks
> >>>> to
> >>>> another folder they’d have to manually rename all the files/notebooks
> >>>> at
> >>>> present. That’s not very user-friendly.
> >>>>
> >>>>
> >>>>
> >>>> Thanks, Lucas.
> >>>>
> >>>> *From:* Jeff Zhang <[email protected]>
> >>>> *Sent:* 13 August 2018 09:06
> >>>> *To:* [email protected]
> >>>> *Cc:* dev <[email protected]>
> >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> >>>> instead of [NOTEID]/note.json
> >>>>
> >>>>
> >>>>
> >>>> In that case, zeppelin should fail to create note.
> >>>>
> >>>>
> >>>>
> >>>> Felix Cheung <[email protected]>于2018年8月13日周一 下午3:47写道:
> >>>>
> >>>> Perhaps one concern is users having characters in note name that are
> >>>> invalid for file name/file path?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> *From:* Mohit Jaggi <[email protected]>
> >>>> *Sent:* Sunday, August 12, 2018 6:02 PM
> >>>> *To:* [email protected]
> >>>> *Cc:* dev
> >>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> >>>> instead of [NOTEID]/note.json
> >>>>
> >>>>
> >>>>
> >>>> sounds like a good idea!
> >>>>
> >>>>
> >>>>
> >>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <[email protected]> wrote:
> >>>>
> >>>> Motivation
> >>>>
> >>>> The motivation of ZEPPELIN-2619 is to change the notes storage
> >>>> structure. Previously we store it using {noteId}/note.json, we’d like to
> >>>> change it into {note_name}_{note_id}.zpln. There are several reasons for
> >>>> this change.
> >>>>
> >>>>
> >>>>
> >>>> 1. {noteId}/note.json is not scalable. We put all notes in one root
> >>>> folder in flat structure. And when zeppelin server starts, we need to
> >>>> read
> >>>> all note.json to get the note file name and build the note folder
> >>>> structure
> >>>> (Because we need to get the note name which is stored in note.json to
> >>>> build
> >>>> the notebook menu). This would be a nightmare when you have large
> >>>> amounts
> >>>> of notes.
> >>>> 2. {noteId}/note.json is not maintainable. It is difficult for a
> >>>> developer/administrator to find note file based on note name.
> >>>> 3. {noteId}/note.json has no folder structure. Currently zeppelin
> >>>> have to build the folder structure internally in memory according
> >>>> note name
> >>>> which is a big overhead.
> >>>>
> >>>>
> >>>> New Approach
> >>>>
> >>>> As I mentioned above, I propose to change the note storage structure
> >>>> to {note_name}_{note_id}.zpln. note_name could contains folders, e.g.
> >>>> folder_1/mynote_abcd.zpln
> >>>>
> >>>> This kind of note storage structure could bring several benefits.
> >>>>
> >>>> 1. We don’t need to load all notes when zeppelin starts. We just
> >>>> need to list each folder to get the note name and note_id.
> >>>> 2. It is much maintainable so that it is easy to find the note file
> >>>> based on note name.
> >>>> 3. It has the folder structure already. That can be mapped to the
> >>>> note folder structure.
> >>>>
> >>>>
> >>>> Side Effect
> >>>>
> >>>> This approach only works for file system storage, so that means we have
> >>>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
> >>>> see any users talk about this in community, so I assume no one is using
> >>>> it.
> >>>>
> >>>>
> >>>>
> >>>> This is overall design, welcome any comments and feedback. Thanks.
> >>>>
> >>>>
> >>>>
> >>>> Here's the google docs, you can also comment it here.
> >>>>
> >>>>
> >>>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >> 이종열, Jongyoul Lee, 李宗烈
> >> http://madeng.net
> >>
> >
>