[Neo4j] Brainstorming on my project: neo4john

John cyuczieekc Sun, 31 Jul 2011 06:09:53 -0700

Hey guys,

I've been thinking that I would like to have a topic (like this current one)
where I would be allowed to post anything related to brainstorming on my
project which is currently a mix of neo4j and berkeleydb java edition. That
is, I would like to start from scratch and explain and explore ideas, where
anyone could step in and say what on their mind, especially with notes on
how that would be better with neo4j rather than berkeleydb.


But I'd like to know if this is a good idea to do here, and if any neo4j
people would allow me to do this here. This would probably mean you'd
receive lots of emails with this subject and Re: this subject, which you may
not want to receive, in which case I would suggest a filter to ignore such
emails (easily done within gmail for example) - but be sure not to ignore
the sender which is always user@lists.neo4j.org for any topic/subject not
just this one. So, anyone could potentially ignore my emails that I send
here, should they be annoyed or they be too many too soon.
  Still, I would not do this unless most (if not all) of you (mainly neo4j
devs I'd say) agree to allow me to post here. I would post replies only to
this topic... well you get the idea :)
Though you should reserve the right at any time to say stop if you don't
want me to post anymore (due to ie. too frequent post, too dumb content,
content seems like noise and doesn't help anyone) - that is, in the case you
allow me to post :) - so if allowed, please reply and say so, otherwise if
no replies with allowed or not, will default to `not allowed`, so I won't
try to post anymore :) - be kind lol
  If I know Peter, and I don't lol, he'd be happy with some brainstorming I
think, right? :) then, what about the others?
Btw, if you feel like saying that I'm "allowed" would be too much of a
responsibility or taking it from others, then maybe say that you wouldn't
mind if I posted or not, or would make no difference to you. Though the
neo4j guys&girls (ie. devs) would probably know if `me posting on this topic
would be a good idea, for them and the users using this mailing list`.

If you're wondering why would I do this, most importantly because it helps
me by typing my thoughts rather than just thinking them in my head, if I
don't type them I get easily distracted by other things and they end up
being postponed/abandoned. Expressing my thoughts by typing them seems to be
bridging both the physical and the mental in a way that they're both happy
to do this heh.
  And also this might be helpful to others reading this, unless they get
annoyed by my way of writing (which means both me and him are at fault, or
rather the cause of his annoyance) or they get annoyed for other reasons but
still triggered by them reading what I write.
 I am not good at writing or at programming for that matter, and I'm aware
of this, but I believe that expressing myself in this written form might
help (at least) me (and I hope not at the expense of others ie. like spam)
and will likely,along the way, trigger some progress in me, which if you ask
me, is in everyone's interest: the more people "evolve" the better is for
everyone, no? yes,good :)

  No one is required (or expected of me) to read what I write, btw; but you
should know that my subconscious, for some reason, likes knowing that
someone did read and got beneficial results from it, ie. got something
positive rather than negative (though any change is progress, except ie. if
you make a system on top of that saying that ie. `counter` must increase for
it to be considered progressing, so then while any change is progress at the
lower level even if counter decreases, at this higher level, counter
decreasing is not considered progress anymore; but then again at an even
higher level, over time counter could be increasing by 10 then decreasing by
10 such that it would seem to be oscillating, and this would be considered
no progress, rather it would be considered constant, unless the oscillation
amplitude would change or increase ie. counter would increase by 15 and then
decrease by 15 over time, this would be progress as considered at this
level).
  So while I am sort of waiting for a "good enough" reason to hack my own
subconscious and change it (assumed that it's possible, hey neuroplasticity
would say so heh) such that i wouldn't require expressing my thoughts in
writing or feeling empowered knowing that others are reading that, (while
that) I am going along with what seems to be the next feel-empowered
step...(kinda forgot what I wanted to say here xD)
  Also the subconscious(not just mine I'd say) likes to know that it did
something, sort of like has a foundation for allowing itself to feel
empowered, by having something done in the physical worlds that it is proud
of, can be used by it as a permission slip to allow itself to feel happy
about it or rather empowered; so in this respect this me writing my thoughts
here stuff also helps with that. :)
  There's also some inherent desire to share, so by writing here I am
sharing my thoughts, me... this also works well as a permission slip for my
subconscious :P I mean, hi :)

Thing is, I am not yet sure which would be better suited for me to use,
neo4j or berkeleydb. I like neo4j's features especially the transactions,
though I do not understand/know its limitations but I'm not sure if I need
all its features some of them seem like overkill (ie. properties, I'd need
one though ie. key="name" but only for some (few) nodes), on the other hand
I like the simplicity of berkeleydb upon which I've built so far part of the
project (btw project is here: https://github.com/13th-floor/neo4john ).

But the brainstorming would be a bit more general, I think, such as I'd be
stating and following my thoughts about what I'd like to have done and how
I'd do it. But eventually limiting them to java and the layers upon I want
to build them upon (ie. neo4j or bdb).

I've been trying a bottom-up approach, with my project, basically I want the
lowest accessible level to be a Symbol or rather a Node as you'd say, such
that I would know if it's connected to anything or not. Rather than how RAM
is "working", ie. like an array.

Typically I want to be able to know if any part of the system is connected
to any other parts with the purpose of exploring the system and
understanding how it works. Ok too generic, let's take an example:
probably a bad example but say you are now in your browser as I am here, and
I see these buttons in the page Send, Save Now, Discard and I decide to
investigate how they work, where is the text that is inside them, and if
that text is changed will the button enlarge automatically or is the text
going to be cut off due to button remaining the same size,
these kind of questions I should be able to answer when using a system built
on the `system that I want to design`(= my project).
  And also questions like, how do I add another button, by checking how
other buttons are added, but likely they would be part of a list which
currently has 3 children those buttons, but those 3 children are in fact
Nodes which identify those buttons, any other details related to the buttons
is deeper, either children of those nodes, or simply children of a parent
which refers to the identifying node and to a node that identifies the
details for that button node... ok this is way too vague so ignore it;
  Sure you'd say there is some plugin in firefox which can show me info
about those buttons and stuff, but that took time to made, I need that
accessibility now, not after I make the plugin, lucky that someone made it
right? :) but how many things you already want to be able to access in a
manner that both you and your computer understand but they are only
accessible to the computer, and that even in the way that allows the
computer to execute them but not necessarily understand them (if them
computers had intelligence)

But seriously xD, my intent is to have accessibility. I'd probably need just
a simple 3D tool to parse nodes and keep track of them, temporarily, while
I'm studying some system.., in fact I could be studying the system that is
the programming for this 3D tool that I use to study it, such that I could
customize it on the fly (sure by mistakingly removing some node I could cut
my access, ie. removing the keyboard/mouse from inputs would render the 3D
tool idle and I couldn't undo that action, but safeguards for that could
always be implemented and even some feature could be non-modifiable, or a
backup 3D tool could take over if it was set as a safeguard while working on
the real one, ...)

================== some brainstorming or something:
What I have done so far, with berkeleydb:
Typically I wanted an easily identifiable entity, Symbol or we can call it
simply a Node
This Node is uniquely identifiable in the system, ie. by it's long (ie. java
Long) identifier,  this is how is identified inside berkeleydb (neo4j also
does this too, a long is used for an ID, as I deduced)
So at this lowest level, there can be 0 or many nodes, where no nodes can
actually be duplicate without actually meaning you're referring to the same
node
This would be say Level 0.
------------------
At the next level, Level 1, two random nodes could be grouped together such
that, one of them would be first and the other would be second (or last),
in other words this seems like an ordered list of 2 elements (Nodes), which
you neo4j call a relationship.
Node A --> Node B
this differs from
Node B --> Node A
that is, they are two different relationships, clearly indicating which node
is first

  Neo4j here makes a new element (previous only Node being one) called
Relationship, and assigns it a long aka indentifier, such that this
relationship is also uniquely identifiable, and associates the two nodes
with this relationship. btw, I don't know this for sure, but I am guessing
this is what neo4j does, that is the relationship ID is part of its storage,
and not just a construct made only for java methods to use

   Anyway, what I did, with berkeleydb, I got two primary databases, which
are always only just key->value where you can lookup data by key and return
the value associated with it.
Since you can't lookup by value, I had to create that second primary
database (not secondary database because this implies symmetric key-value,
that is, for any key only one value would be associated with it, so then
A->B and A->C cannot exist, with primary databse and secondary database in
berkeleydb)

forwardDB:
firstNode -> secondNode

backwardDB:
secondNode -> firstNode

that's the format of the databases' contents

a key can exist multiple times, but it will be seen as one ie.
A->B
  ->D
  ->C
X->F
 ->B
Y->C

so basically, a key with multiple values can be seen as a set with those
values
  the values are in no (user)defined order (though they are internally
stored by berkeleydb+settings as sorted for faster lookups: you can actually
say is A->C in the database, and it will return true if yes)
  The values can be iterated with a Cursor (bdb cursor) - though if I
remember well, you can't jump to the last value of that specific key (though
you can jump to the last value in the database); I wasn't too happy about
this lack of functionality but I lived :)

so this was my way of storing relationships, and I would always use both
start and end node in order to identify a relationship

I am not yet sure why I decided this way is better than having a
Relationship entity uniquely identifiable by its long and have its ID
associated with each of the nodes. I guess I never needed to refer to that
relationship more than once, and seemed like extraneous info to have
RelationshipID as a middle.
Not to mention that it wouldn't feel like the Node is the element anymore,
it would be both Node and Relationship; still, if it's really needed I will
consider it. But for now, I didn't believe it would be needed.

------------------
there is actually another side level, or parallel level with Level 0 here,
such that from java, we need to refer to the same Node across program
restarts
ie. first run, creating unique node, we don't know what it's ID is going to
be, since ie. maybe  other nodes were present in the database
so on the next application restart how would we know to get the same node
that we created on the last run ?

one particular case where this would not be needed at first, would be when
you know the database is empty and you specify the node ID to be created,
ie. create or get node with ID = 1038
this might work at first, but its a very bad idea to use raw IDs like that,
rather just let bdb give me the next available id ie. by using a Sequence

So then, another layer was needed, associated a String1 with a NodeID, this
is similar to adding neo4j index with key="name" and value=String1
this String1 would then be considered like the name of the node, this has no
other purpose than allowing the in-java program to access the same node
across application restarts, there is no data supposed to be stored in it.
Thus, here, in berkeleydb, I made `a primary and a seconday` databases
such that, I can ask the questions:
1) what is the name of the node with this NodeID ?
2) what is the NodeID for the node with this given name ?
this is a HashMap acting database (formed of a primary and a secondary
database)
so it acts like a HashMap that doesn't accept nulls

you can call this the NamingLevel :) that is, just in case I or you need to
refer to this later using just one word rather than a phrase.

--------------
Level 2 here would be of course based on Level 1
so since we have that kind of grouping in Level 1, we can then treat a Node
as a set or 0 or more elements(which are Nodes)
we can think of a node as being a parent for a bunch of nodes, simply
because it is the first node (ie. key) in a bunch of relationships
and we can think of that same node as being a child in one of more
relationships, or being pointer to by those nodes, because it is the end
node(second node) of the relationship.

So here we can pretty much define some java entities like Pointer and Set,
and make sure that in java they are limited to what they do,
ie. Pointer should be able to have 0 or 1 children, not more, and user
shouldn't be allowed to add more
but of course this doesn't stop a user to use the underlayingNode directly
and add children by using the Level 1  methods directly on the
underlayingNode (using this naming from graph-collections by Niels, seems
more easy to understand)
  Basically the Pointer class would be on top of a Node, the
underlayingNode;
the problem here is, that although we've defined a new level/layer here, the
limitations that it enforces can be bypassed by going directly to the level
below it, Level 1 that is, and easily invalidate this as a Pointer.
  So I am not yet sure here, how would I deal with this issue, would I do a
check if the Pointer is still valid whenever any of its methods are used ?
and if it's invalid (ie. it now has 3 children instead of the 0 or 1
limitation) then what do I do? throw an exception I guess?
 What about knowing when and what code did the invalidation ? ie. like put a
hook on that Node such that when the user tried to attach more than 1 child
(just an example of invalidating the Pointer) then this hook would throw at
that point such that in stack trace the offending code would be detected.
 The funny thing is, that to build such a hook system (which is definitely a
must at higher levels higher than Level 2 currently) one must make use of
levels like the one I am trying to define here namely Pointer and Sets and
ordered lists even, so it would seem as if, I need to implement these levels
first, without hooks, and then when done with them, build on top of them the
hook system which makes use of them, and then when this hook system is
complete rebuild the Pointer and sets and ordered lists again as even higher
levels but which this time make use of the hook system and are able to
detect such code that tries to invalidate the Pointer by working at lower
levels directly (of course unless the lowest level is below the hook system
heh then, hmm, can't think... need to get there first :/ )

a Set again, same thing, a node would be use to identify this set, that node
would be underlayingNode, but this Set thingy is just a wrapper in java on
top of a Node, basically it will just allow the java coder/user to treat a
node as a set from within java, but that node would still act as if it's
just a simple start node that is, a node having multiple outgoing
relationships:
A->B
A->C
A->E
A->D
(order of children is undefined btw, it is a set not an ordered list, not at
this level anyway)
so A would be a Set, and B,C,E,D are its elements
this is how it can be seen from java, or it can be seen as it is stored too,
just as groups of start and end nodes

so this Level 2 thingy is more of a level in java rather than a real level
as Level 1 and Level 0 are

Also, on this level Pointer to Domain and DomainSet can be defined,
DomainPointer: that is, a pointer would be allowed to `point-to / have` a
child which is a child of a certain Set D which is the domain
ie. P->A
D->{A,C,B,X}
P is the pointer, A is the pointee, and D is the domain, and notice A is
child of D
so DomainPointer would enforce having a pointee only from that domain D

There are also some variants, like for Pointer and DomainPointer, are they
allowed to be null ie. have no children / point to nothing ?

DomainSet is a Set who's children must be children of a domain D
DS->{A,X,C}
D->{A,C,B,X}

also note, a set cannot contain duplicates, almost forgot to say that;
if duplicates are needed, an intermediary node and special format would be
expected from that node
ie.
S->h->A
where h is an intermediary node
if such a set is ever to be defined, this would probably only make sense
when having an ordered list - that is, having duplicate elements would make
sense in a list, ordered or not (unordered list or a list where you don't
care about the order can be based on an ordered list)
Ordered List would be defined differently, as a double linked list, with
head/tail, elementcapsule (aka entry as seen in graph-collections by Niels
which is here btw: https://github.com/peterneubauer/graph-collections )
which has next/prev and pointer to the real element; this is easy to support
duplicates

But also in an ordered list, I should be able to specify if the list can
contain null elements, ie entry.element has no children; or that if it can
contain elements only from a specific domain D.

Back to the set with intermediary nodes, I don't see the need for it, but if
needed, then there are 3 variants:
1. set will always use intermediary nodes even if nodes are not dups
2. set will only use intermediary nodes for dup nodes (never tried)
3. set will never use intermediary nodes, which means no dups are supported
there may be other ways to define these, but more complex, ie. not using
intermediary nodes but storing a counter for duplicate nodes, somewhere else
(neo4j could easily add a counter property on the relationship itself  S->A
if A is twice in the set then that very relationship would contain a
counter=2; all other rels would either have no counter which means it's 1 or
have counter set to 1)

in case 1. there will always be expected to have an intermediary node, so
it's easy to know
in case 2. you don't really know if the node h ie. S->h->A is intermediary
node or it's the element itself
ie.
S->C
S->h->A
S->B
so, you can tag h with another node called AllIntermediariesForSetsWithDups
ie.
AllIntermediariesForSetsWithDups->h
AllIntermediariesForSetsWithDups->x
S->C
S->h->A
S->B
B->x->A
and make sure that all intermediary nodes are unique, ie. created on the fly
when S.add(A) is executed, and simply don't make a set which uses
intermediary nodes from other set as its elements. ie. C->h  when S->h->A  ,
'cause this way, h in C->h can be considered an intermediary node
so to avoid this, you can also
AllSetsWithDups->S
AllSetsWithDups->B
AllSetsWithDups->C
such that, when adding an element C.add(h) you would solve this system of 2
equations:
{ AllSetsWithDups->(X*)->h
{ AllIntermediariesForSetsWithDups -> h
such that if you find an (X*) ==  ie. A in our case, then you'll know that
"h" is part of another set as its intermediary node, and thus avoid using it
ie. throw

limitations are the way to define systems

So since I don't yet intend to use these (I think), I will skip further
brainstorming on sets with intermediary nodes; but later intermediary nodes
will be used in some places.

this Level 2 only makes sense, if defining these: Pointer,Set,
DomainPointer, DomainSet and their variants with allowing null  or not
requires them to store metadata about them in the "graph", such that
a Pointer could store a link/relation from a parent to it: ie.
AllPointers -> P1
so this way, you know that P1 is a pointer, otherwise only the java code
would know that P1 is a pointer, by checking its underlayingNode == P1
but this in itself can be seen as a Set, that is AllPointers is a set having
its elements identify pointers as defined in this Level 2 as Pointer
so this would mean I'd happily make use of Set which is also defined here,
but then if you realize, adding this kind of metadata sort of requires that
non-metadata Pointer/Set etc. be defined and based upon those
That is, is eventually DomainSet uses a Pointer to point to the Domain, and
make sure that data is known to all that are trying to identify what "type"
the node (underlayingNode of the DomainSet) is, without knowing anything
about it, by checking its parents...
 Say, if two java programs are using the same environment (which I'm not yet
sure how to implement due to the need for isolation/serializability ...
we'll see; tho bdb supports opening the same environment/databases from two
or more java programs at the same time with common caching even, and it is
embedded db), and one of them is just exploring, it should be able to tell
what this node is treated as: ie. DomainSet, DomainPointer, Pointer...
though this will happen at higher levels, such metadata be added that is...

btw, this is a graphviz picture of how a DomainSet would look like if it had
in-graph metadata:
https://github.com/13th-floor/neo4john/raw/master/diagrams/level1%20potential%20domainset.png

so it's not a bad idea to first have a TreatAs_X  where X==Pointer  for
example, classes in java, such that they be just wrappers on top of Level 1,
without any metadata stored in-graph, and then use these to define Pointer
and Set etc. with in-graph metadata as above. And they could just do a check
if itself is still valid on each method call, as to avoid or early warn when
they detected they are no longer valid (due to ie. user doing Level 1
changes directly)

Though I am not very happy about the user's ability to use Level 1 directly
and change/invalidate constructs defined at higher levels; even though at
about Level 6 we could define some hook/event layer which could prevent and
directly poinpoint java code blocks trying to do these kind of invalid
changes.

where was I? forgot xD

so far, in my project, these TreatAs wrappers for Pointer and Set etc. are
not doing any checks to see if they're still valid or not, so for example I
could add 10 children for a Pointer and it would not complain that it has to
have 0 or 1 (unless I added some asserts lately, I should recheck)

-----------
anyway somewhere on Level 3, an ordered list would be defined
here I am thinking if I need one without metadata in-graph first , or I
don't need one,... so far I assumed I don't, and thus allowed the ordered
list to be defined fully in-graph, but the code for this is not yet part of
my project (it is part of the old project which I was trying to copy from
but without all the extraneous checks and generic mambo-jumbo)

this is a graphviz picture of how an ordered list would look with in-graph
metadata:
https://github.com/13th-floor/neo4john/raw/master/diagrams/level4%20ordered%20list.png

there is an extension to this, such that it would also store a set along
with it, for fast finds, that is, to check if a certain element exist in the
ordered list, without parsing it entirely, it would check if it's in the set
first, since checking Set->X  which are two nodes you imagine, is lightning
fast because bdb is doing this internally something like searchFindBoth()
method (unsure).
  Now since I am here, I was thinking then, how do I fast find the
ElementCapsule (aka `entry` if you understand it better in graph-collections
terms), considering I've just use Set->X and X is indeed part of the ordered
list,
without having to parse the list again;
if I remember right, in the old project's code, I wouldn't actually parse
the list, but instead I would do it the right way, that is:
a=count parents of X
b=(count children of X)*3 or something where this would yield how many
elements are in list really
if a is bigger ie. 1million and b is like 200 then it might be wise to
iterate the list, or not, considering bdb can find this in 0ms for 1
millions rels anyway
so let us consider then, parents of X... we need to basically parse bottom
up from the element X which is any node (not an elementcapsule) upward to
the node identifying the list
So something like solve this:
ourList->randomECnode3719->randomElementIdentifyingNode189->X
AllElementCapsules->randomECnode3719
AllElements of ElementCapsules -> randomElementIdentifyingNode189->X

the unknown is those equations are:
randomECnode3719 and randomElementIdentifyingNode189
that is how they can be found.

So in the picture, to get symbol2's ElementCapsule without parsing the
entire list,
first fine unique50 which is by solving this:
AllElements of ElementCapsules -> (x) -> symbol2
and (x) would be unique50, because that unique50 is a node that is only used
in this list as an `AllElements of ElementCapsules` and no other list uses
it for this pupose. And if some other layer needs to add a comment to it, ie
a phrase, it would point to it, ie. be its parent, such as:
AllComments->F
F->unique50
F->phrase1
AllComments'Phrases->phrase1
this way, unique50 has phrase1 node associated with it, which could point to
other nodes identifying words which eventually identify letters and numbers
and all this could be interpreted by some programcode and be displayed in
some screen in a way, or the 3D tool can use them and show that phrase above
unique50 when it's on the viewport (shown on screen)

where was I? I forgot my name :)

that would probably do for now I guess, that's to give you an idea about
what garbage I could talk about in my brainstorming sessions, if allowed to
keep posting here

Cheerios,
John
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Brainstorming on my project: neo4john

Reply via email to