Hey guys, I've been thinking that I would like to have a topic (like this current one) where I would be allowed to post anything related to brainstorming on my project which is currently a mix of neo4j and berkeleydb java edition. That is, I would like to start from scratch and explain and explore ideas, where anyone could step in and say what on their mind, especially with notes on how that would be better with neo4j rather than berkeleydb.
But I'd like to know if this is a good idea to do here, and if any neo4j people would allow me to do this here. This would probably mean you'd receive lots of emails with this subject and Re: this subject, which you may not want to receive, in which case I would suggest a filter to ignore such emails (easily done within gmail for example) - but be sure not to ignore the sender which is always user@lists.neo4j.org for any topic/subject not just this one. So, anyone could potentially ignore my emails that I send here, should they be annoyed or they be too many too soon. Still, I would not do this unless most (if not all) of you (mainly neo4j devs I'd say) agree to allow me to post here. I would post replies only to this topic... well you get the idea :) Though you should reserve the right at any time to say stop if you don't want me to post anymore (due to ie. too frequent post, too dumb content, content seems like noise and doesn't help anyone) - that is, in the case you allow me to post :) - so if allowed, please reply and say so, otherwise if no replies with allowed or not, will default to `not allowed`, so I won't try to post anymore :) - be kind lol If I know Peter, and I don't lol, he'd be happy with some brainstorming I think, right? :) then, what about the others? Btw, if you feel like saying that I'm "allowed" would be too much of a responsibility or taking it from others, then maybe say that you wouldn't mind if I posted or not, or would make no difference to you. Though the neo4j guys&girls (ie. devs) would probably know if `me posting on this topic would be a good idea, for them and the users using this mailing list`. If you're wondering why would I do this, most importantly because it helps me by typing my thoughts rather than just thinking them in my head, if I don't type them I get easily distracted by other things and they end up being postponed/abandoned. Expressing my thoughts by typing them seems to be bridging both the physical and the mental in a way that they're both happy to do this heh. And also this might be helpful to others reading this, unless they get annoyed by my way of writing (which means both me and him are at fault, or rather the cause of his annoyance) or they get annoyed for other reasons but still triggered by them reading what I write. I am not good at writing or at programming for that matter, and I'm aware of this, but I believe that expressing myself in this written form might help (at least) me (and I hope not at the expense of others ie. like spam) and will likely,along the way, trigger some progress in me, which if you ask me, is in everyone's interest: the more people "evolve" the better is for everyone, no? yes,good :) No one is required (or expected of me) to read what I write, btw; but you should know that my subconscious, for some reason, likes knowing that someone did read and got beneficial results from it, ie. got something positive rather than negative (though any change is progress, except ie. if you make a system on top of that saying that ie. `counter` must increase for it to be considered progressing, so then while any change is progress at the lower level even if counter decreases, at this higher level, counter decreasing is not considered progress anymore; but then again at an even higher level, over time counter could be increasing by 10 then decreasing by 10 such that it would seem to be oscillating, and this would be considered no progress, rather it would be considered constant, unless the oscillation amplitude would change or increase ie. counter would increase by 15 and then decrease by 15 over time, this would be progress as considered at this level). So while I am sort of waiting for a "good enough" reason to hack my own subconscious and change it (assumed that it's possible, hey neuroplasticity would say so heh) such that i wouldn't require expressing my thoughts in writing or feeling empowered knowing that others are reading that, (while that) I am going along with what seems to be the next feel-empowered step...(kinda forgot what I wanted to say here xD) Also the subconscious(not just mine I'd say) likes to know that it did something, sort of like has a foundation for allowing itself to feel empowered, by having something done in the physical worlds that it is proud of, can be used by it as a permission slip to allow itself to feel happy about it or rather empowered; so in this respect this me writing my thoughts here stuff also helps with that. :) There's also some inherent desire to share, so by writing here I am sharing my thoughts, me... this also works well as a permission slip for my subconscious :P I mean, hi :) Thing is, I am not yet sure which would be better suited for me to use, neo4j or berkeleydb. I like neo4j's features especially the transactions, though I do not understand/know its limitations but I'm not sure if I need all its features some of them seem like overkill (ie. properties, I'd need one though ie. key="name" but only for some (few) nodes), on the other hand I like the simplicity of berkeleydb upon which I've built so far part of the project (btw project is here: https://github.com/13th-floor/neo4john ). But the brainstorming would be a bit more general, I think, such as I'd be stating and following my thoughts about what I'd like to have done and how I'd do it. But eventually limiting them to java and the layers upon I want to build them upon (ie. neo4j or bdb). I've been trying a bottom-up approach, with my project, basically I want the lowest accessible level to be a Symbol or rather a Node as you'd say, such that I would know if it's connected to anything or not. Rather than how RAM is "working", ie. like an array. Typically I want to be able to know if any part of the system is connected to any other parts with the purpose of exploring the system and understanding how it works. Ok too generic, let's take an example: probably a bad example but say you are now in your browser as I am here, and I see these buttons in the page Send, Save Now, Discard and I decide to investigate how they work, where is the text that is inside them, and if that text is changed will the button enlarge automatically or is the text going to be cut off due to button remaining the same size, these kind of questions I should be able to answer when using a system built on the `system that I want to design`(= my project). And also questions like, how do I add another button, by checking how other buttons are added, but likely they would be part of a list which currently has 3 children those buttons, but those 3 children are in fact Nodes which identify those buttons, any other details related to the buttons is deeper, either children of those nodes, or simply children of a parent which refers to the identifying node and to a node that identifies the details for that button node... ok this is way too vague so ignore it; Sure you'd say there is some plugin in firefox which can show me info about those buttons and stuff, but that took time to made, I need that accessibility now, not after I make the plugin, lucky that someone made it right? :) but how many things you already want to be able to access in a manner that both you and your computer understand but they are only accessible to the computer, and that even in the way that allows the computer to execute them but not necessarily understand them (if them computers had intelligence) But seriously xD, my intent is to have accessibility. I'd probably need just a simple 3D tool to parse nodes and keep track of them, temporarily, while I'm studying some system.., in fact I could be studying the system that is the programming for this 3D tool that I use to study it, such that I could customize it on the fly (sure by mistakingly removing some node I could cut my access, ie. removing the keyboard/mouse from inputs would render the 3D tool idle and I couldn't undo that action, but safeguards for that could always be implemented and even some feature could be non-modifiable, or a backup 3D tool could take over if it was set as a safeguard while working on the real one, ...) ================== some brainstorming or something: What I have done so far, with berkeleydb: Typically I wanted an easily identifiable entity, Symbol or we can call it simply a Node This Node is uniquely identifiable in the system, ie. by it's long (ie. java Long) identifier, this is how is identified inside berkeleydb (neo4j also does this too, a long is used for an ID, as I deduced) So at this lowest level, there can be 0 or many nodes, where no nodes can actually be duplicate without actually meaning you're referring to the same node This would be say Level 0. ------------------ At the next level, Level 1, two random nodes could be grouped together such that, one of them would be first and the other would be second (or last), in other words this seems like an ordered list of 2 elements (Nodes), which you neo4j call a relationship. Node A --> Node B this differs from Node B --> Node A that is, they are two different relationships, clearly indicating which node is first Neo4j here makes a new element (previous only Node being one) called Relationship, and assigns it a long aka indentifier, such that this relationship is also uniquely identifiable, and associates the two nodes with this relationship. btw, I don't know this for sure, but I am guessing this is what neo4j does, that is the relationship ID is part of its storage, and not just a construct made only for java methods to use Anyway, what I did, with berkeleydb, I got two primary databases, which are always only just key->value where you can lookup data by key and return the value associated with it. Since you can't lookup by value, I had to create that second primary database (not secondary database because this implies symmetric key-value, that is, for any key only one value would be associated with it, so then A->B and A->C cannot exist, with primary databse and secondary database in berkeleydb) forwardDB: firstNode -> secondNode backwardDB: secondNode -> firstNode that's the format of the databases' contents a key can exist multiple times, but it will be seen as one ie. A->B ->D ->C X->F ->B Y->C so basically, a key with multiple values can be seen as a set with those values the values are in no (user)defined order (though they are internally stored by berkeleydb+settings as sorted for faster lookups: you can actually say is A->C in the database, and it will return true if yes) The values can be iterated with a Cursor (bdb cursor) - though if I remember well, you can't jump to the last value of that specific key (though you can jump to the last value in the database); I wasn't too happy about this lack of functionality but I lived :) so this was my way of storing relationships, and I would always use both start and end node in order to identify a relationship I am not yet sure why I decided this way is better than having a Relationship entity uniquely identifiable by its long and have its ID associated with each of the nodes. I guess I never needed to refer to that relationship more than once, and seemed like extraneous info to have RelationshipID as a middle. Not to mention that it wouldn't feel like the Node is the element anymore, it would be both Node and Relationship; still, if it's really needed I will consider it. But for now, I didn't believe it would be needed. ------------------ there is actually another side level, or parallel level with Level 0 here, such that from java, we need to refer to the same Node across program restarts ie. first run, creating unique node, we don't know what it's ID is going to be, since ie. maybe other nodes were present in the database so on the next application restart how would we know to get the same node that we created on the last run ? one particular case where this would not be needed at first, would be when you know the database is empty and you specify the node ID to be created, ie. create or get node with ID = 1038 this might work at first, but its a very bad idea to use raw IDs like that, rather just let bdb give me the next available id ie. by using a Sequence So then, another layer was needed, associated a String1 with a NodeID, this is similar to adding neo4j index with key="name" and value=String1 this String1 would then be considered like the name of the node, this has no other purpose than allowing the in-java program to access the same node across application restarts, there is no data supposed to be stored in it. Thus, here, in berkeleydb, I made `a primary and a seconday` databases such that, I can ask the questions: 1) what is the name of the node with this NodeID ? 2) what is the NodeID for the node with this given name ? this is a HashMap acting database (formed of a primary and a secondary database) so it acts like a HashMap that doesn't accept nulls you can call this the NamingLevel :) that is, just in case I or you need to refer to this later using just one word rather than a phrase. -------------- Level 2 here would be of course based on Level 1 so since we have that kind of grouping in Level 1, we can then treat a Node as a set or 0 or more elements(which are Nodes) we can think of a node as being a parent for a bunch of nodes, simply because it is the first node (ie. key) in a bunch of relationships and we can think of that same node as being a child in one of more relationships, or being pointer to by those nodes, because it is the end node(second node) of the relationship. So here we can pretty much define some java entities like Pointer and Set, and make sure that in java they are limited to what they do, ie. Pointer should be able to have 0 or 1 children, not more, and user shouldn't be allowed to add more but of course this doesn't stop a user to use the underlayingNode directly and add children by using the Level 1 methods directly on the underlayingNode (using this naming from graph-collections by Niels, seems more easy to understand) Basically the Pointer class would be on top of a Node, the underlayingNode; the problem here is, that although we've defined a new level/layer here, the limitations that it enforces can be bypassed by going directly to the level below it, Level 1 that is, and easily invalidate this as a Pointer. So I am not yet sure here, how would I deal with this issue, would I do a check if the Pointer is still valid whenever any of its methods are used ? and if it's invalid (ie. it now has 3 children instead of the 0 or 1 limitation) then what do I do? throw an exception I guess? What about knowing when and what code did the invalidation ? ie. like put a hook on that Node such that when the user tried to attach more than 1 child (just an example of invalidating the Pointer) then this hook would throw at that point such that in stack trace the offending code would be detected. The funny thing is, that to build such a hook system (which is definitely a must at higher levels higher than Level 2 currently) one must make use of levels like the one I am trying to define here namely Pointer and Sets and ordered lists even, so it would seem as if, I need to implement these levels first, without hooks, and then when done with them, build on top of them the hook system which makes use of them, and then when this hook system is complete rebuild the Pointer and sets and ordered lists again as even higher levels but which this time make use of the hook system and are able to detect such code that tries to invalidate the Pointer by working at lower levels directly (of course unless the lowest level is below the hook system heh then, hmm, can't think... need to get there first :/ ) a Set again, same thing, a node would be use to identify this set, that node would be underlayingNode, but this Set thingy is just a wrapper in java on top of a Node, basically it will just allow the java coder/user to treat a node as a set from within java, but that node would still act as if it's just a simple start node that is, a node having multiple outgoing relationships: A->B A->C A->E A->D (order of children is undefined btw, it is a set not an ordered list, not at this level anyway) so A would be a Set, and B,C,E,D are its elements this is how it can be seen from java, or it can be seen as it is stored too, just as groups of start and end nodes so this Level 2 thingy is more of a level in java rather than a real level as Level 1 and Level 0 are Also, on this level Pointer to Domain and DomainSet can be defined, DomainPointer: that is, a pointer would be allowed to `point-to / have` a child which is a child of a certain Set D which is the domain ie. P->A D->{A,C,B,X} P is the pointer, A is the pointee, and D is the domain, and notice A is child of D so DomainPointer would enforce having a pointee only from that domain D There are also some variants, like for Pointer and DomainPointer, are they allowed to be null ie. have no children / point to nothing ? DomainSet is a Set who's children must be children of a domain D DS->{A,X,C} D->{A,C,B,X} also note, a set cannot contain duplicates, almost forgot to say that; if duplicates are needed, an intermediary node and special format would be expected from that node ie. S->h->A where h is an intermediary node if such a set is ever to be defined, this would probably only make sense when having an ordered list - that is, having duplicate elements would make sense in a list, ordered or not (unordered list or a list where you don't care about the order can be based on an ordered list) Ordered List would be defined differently, as a double linked list, with head/tail, elementcapsule (aka entry as seen in graph-collections by Niels which is here btw: https://github.com/peterneubauer/graph-collections ) which has next/prev and pointer to the real element; this is easy to support duplicates But also in an ordered list, I should be able to specify if the list can contain null elements, ie entry.element has no children; or that if it can contain elements only from a specific domain D. Back to the set with intermediary nodes, I don't see the need for it, but if needed, then there are 3 variants: 1. set will always use intermediary nodes even if nodes are not dups 2. set will only use intermediary nodes for dup nodes (never tried) 3. set will never use intermediary nodes, which means no dups are supported there may be other ways to define these, but more complex, ie. not using intermediary nodes but storing a counter for duplicate nodes, somewhere else (neo4j could easily add a counter property on the relationship itself S->A if A is twice in the set then that very relationship would contain a counter=2; all other rels would either have no counter which means it's 1 or have counter set to 1) in case 1. there will always be expected to have an intermediary node, so it's easy to know in case 2. you don't really know if the node h ie. S->h->A is intermediary node or it's the element itself ie. S->C S->h->A S->B so, you can tag h with another node called AllIntermediariesForSetsWithDups ie. AllIntermediariesForSetsWithDups->h AllIntermediariesForSetsWithDups->x S->C S->h->A S->B B->x->A and make sure that all intermediary nodes are unique, ie. created on the fly when S.add(A) is executed, and simply don't make a set which uses intermediary nodes from other set as its elements. ie. C->h when S->h->A , 'cause this way, h in C->h can be considered an intermediary node so to avoid this, you can also AllSetsWithDups->S AllSetsWithDups->B AllSetsWithDups->C such that, when adding an element C.add(h) you would solve this system of 2 equations: { AllSetsWithDups->(X*)->h { AllIntermediariesForSetsWithDups -> h such that if you find an (X*) == ie. A in our case, then you'll know that "h" is part of another set as its intermediary node, and thus avoid using it ie. throw limitations are the way to define systems So since I don't yet intend to use these (I think), I will skip further brainstorming on sets with intermediary nodes; but later intermediary nodes will be used in some places. this Level 2 only makes sense, if defining these: Pointer,Set, DomainPointer, DomainSet and their variants with allowing null or not requires them to store metadata about them in the "graph", such that a Pointer could store a link/relation from a parent to it: ie. AllPointers -> P1 so this way, you know that P1 is a pointer, otherwise only the java code would know that P1 is a pointer, by checking its underlayingNode == P1 but this in itself can be seen as a Set, that is AllPointers is a set having its elements identify pointers as defined in this Level 2 as Pointer so this would mean I'd happily make use of Set which is also defined here, but then if you realize, adding this kind of metadata sort of requires that non-metadata Pointer/Set etc. be defined and based upon those That is, is eventually DomainSet uses a Pointer to point to the Domain, and make sure that data is known to all that are trying to identify what "type" the node (underlayingNode of the DomainSet) is, without knowing anything about it, by checking its parents... Say, if two java programs are using the same environment (which I'm not yet sure how to implement due to the need for isolation/serializability ... we'll see; tho bdb supports opening the same environment/databases from two or more java programs at the same time with common caching even, and it is embedded db), and one of them is just exploring, it should be able to tell what this node is treated as: ie. DomainSet, DomainPointer, Pointer... though this will happen at higher levels, such metadata be added that is... btw, this is a graphviz picture of how a DomainSet would look like if it had in-graph metadata: https://github.com/13th-floor/neo4john/raw/master/diagrams/level1%20potential%20domainset.png so it's not a bad idea to first have a TreatAs_X where X==Pointer for example, classes in java, such that they be just wrappers on top of Level 1, without any metadata stored in-graph, and then use these to define Pointer and Set etc. with in-graph metadata as above. And they could just do a check if itself is still valid on each method call, as to avoid or early warn when they detected they are no longer valid (due to ie. user doing Level 1 changes directly) Though I am not very happy about the user's ability to use Level 1 directly and change/invalidate constructs defined at higher levels; even though at about Level 6 we could define some hook/event layer which could prevent and directly poinpoint java code blocks trying to do these kind of invalid changes. where was I? forgot xD so far, in my project, these TreatAs wrappers for Pointer and Set etc. are not doing any checks to see if they're still valid or not, so for example I could add 10 children for a Pointer and it would not complain that it has to have 0 or 1 (unless I added some asserts lately, I should recheck) ----------- anyway somewhere on Level 3, an ordered list would be defined here I am thinking if I need one without metadata in-graph first , or I don't need one,... so far I assumed I don't, and thus allowed the ordered list to be defined fully in-graph, but the code for this is not yet part of my project (it is part of the old project which I was trying to copy from but without all the extraneous checks and generic mambo-jumbo) this is a graphviz picture of how an ordered list would look with in-graph metadata: https://github.com/13th-floor/neo4john/raw/master/diagrams/level4%20ordered%20list.png there is an extension to this, such that it would also store a set along with it, for fast finds, that is, to check if a certain element exist in the ordered list, without parsing it entirely, it would check if it's in the set first, since checking Set->X which are two nodes you imagine, is lightning fast because bdb is doing this internally something like searchFindBoth() method (unsure). Now since I am here, I was thinking then, how do I fast find the ElementCapsule (aka `entry` if you understand it better in graph-collections terms), considering I've just use Set->X and X is indeed part of the ordered list, without having to parse the list again; if I remember right, in the old project's code, I wouldn't actually parse the list, but instead I would do it the right way, that is: a=count parents of X b=(count children of X)*3 or something where this would yield how many elements are in list really if a is bigger ie. 1million and b is like 200 then it might be wise to iterate the list, or not, considering bdb can find this in 0ms for 1 millions rels anyway so let us consider then, parents of X... we need to basically parse bottom up from the element X which is any node (not an elementcapsule) upward to the node identifying the list So something like solve this: ourList->randomECnode3719->randomElementIdentifyingNode189->X AllElementCapsules->randomECnode3719 AllElements of ElementCapsules -> randomElementIdentifyingNode189->X the unknown is those equations are: randomECnode3719 and randomElementIdentifyingNode189 that is how they can be found. So in the picture, to get symbol2's ElementCapsule without parsing the entire list, first fine unique50 which is by solving this: AllElements of ElementCapsules -> (x) -> symbol2 and (x) would be unique50, because that unique50 is a node that is only used in this list as an `AllElements of ElementCapsules` and no other list uses it for this pupose. And if some other layer needs to add a comment to it, ie a phrase, it would point to it, ie. be its parent, such as: AllComments->F F->unique50 F->phrase1 AllComments'Phrases->phrase1 this way, unique50 has phrase1 node associated with it, which could point to other nodes identifying words which eventually identify letters and numbers and all this could be interpreted by some programcode and be displayed in some screen in a way, or the 3D tool can use them and show that phrase above unique50 when it's on the viewport (shown on screen) where was I? I forgot my name :) that would probably do for now I guess, that's to give you an idea about what garbage I could talk about in my brainstorming sessions, if allowed to keep posting here Cheerios, John _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user