Re: [Neo4j] Lucene/Neo Indexing Question

Craig Taverner Mon, 02 May 2011 12:08:04 -0700

I see your point. My current code in amanzi-index definitely expects the
total number and types of the properties to be configured in advance. Two
tags of the same vocabulary is no problem, but the number and name of the
tags included in the index is configured up-front, because the tree is a
kind of multi-dimensional quad-tree with a pre-configured dimension and a
pre-configured mapper from value->key for each dimension.


The only way to use it for your case would be to pre-configure it for a set
of known common tags.

Otherwise I guess you fall back on the round-robin idea, possibly using a
custom mapper in amanzi-index, or just with your own code entirely. If you
have only one dimension, the value gain of using amanzi-index is probably
not that high anymore.

On Mon, May 2, 2011 at 5:54 PM, Rick Bullotta
<rick.bullo...@thingworx.com>wrote:

> Ah, if only it were so...
>
> The number of indexable properties (tags) is completely variable on a "per
> car" basis (e.g. I can add a "driverMood" tag for just a subset of cars) -
> meaning that the domain objects themselves can have a variable number of
> "tags" and can indeed even be tagged with two values from the same
> vocabulary (e.g. a car can have two-color paint, red and blue).
>
> The round-robin idea has some merit, but of course, identifying/determining
> the sub-tree width (# of index randomly assigned index subnodes) is somewhat
> subjective in terms of determining what would help address the concurrency
> issues at the possible expense of traversal performance.  Also, the
> "hotspot" or "supernode" issue exists a number of other places in our
> application wherever we are constantly adding (or removing) content related
> to an entity in the system.  It seems that a lot of the current users of Neo
> are doing "bulk loads" and using it for analysis as opposed to using it like
> an OLTP data store (like we are), so I'm guessing the hotspot issue is
> unique to our domain.
>
> I'm still leaning towards Lucene, but will experiment with a few approaches
> to see what works best in different scenarios, and will try implementing
> something along the lines of what you describe.
>
>
>
> -----Original Message-----
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
> On Behalf Of Craig Taverner
> Sent: Monday, May 02, 2011 11:29 AM
> To: Neo4j user discussions
> Subject: Re: [Neo4j] Lucene/Neo Indexing Question
>
> Thinking back you your original domain description, cars with colors,
> surely
> you have more properties than just colors to index?
>
> If you have two or more properties, then you use combinations of properties
> for the first level of the index tree, which provides your logical
> partitioning of supernodes in a domain specific way. For example,
> considering having the four properties color, manufacturer, model, year.
> The
> first level of index nodes would be the set of unique combinations of all
> possible properties (all existing combinations, actually). This set is much
> larger than the set of colors. So red will occur many times. As a result
> you
> dramatically reduce node contention, and the number of relationships per
> node is much less. Then if you want to perform the query for all red cars,
> actually your traverser needs to be only slightly more complex, basically
> 'find all cars with color red and any value of the other properties'.
>
> This is the design of the 'amanzi-index' I started on github in December
> (but did not complete). It was focusing on doing queries on multiple
> properties at the same time, but does effectively cover your case of
> reducing node contention, if you can add more properties to the index. It
> also has the concept of a mapper from the domain specific property to the
> index key, which was designed to reduce the number of index nodes, but in
> your case you could also use it to increase the number of index nodes,
> using
> some of the ideas by Jim and Michael. Jim suggested that instead or 'red'
> always mapping to the same node, it could map to a set of different nodes
> (randomly selected, or round robin). Michael discussed a distributed
> hash-code, which I do not fully understand, but it does sound relevant :-)
>
> So, in short, using the design of the amanzi-index you could help this
> problem in two ways:
>
>   - index together with other properties to get a domain-specific
>   partitioning of the 'supernodes'
>   - Add a mapper between the color and the index key to get partitioning of
>   the supernodes
>
>
> On Mon, May 2, 2011 at 1:09 PM, Rick Bullotta
> <rick.bullo...@thingworx.com>wrote:
>
> > Hi, Michael.
> >
> > The nature of the domain model really doesn't lend itself to any logical
> > partioning of "supernodes", so it would indeed have to be something very
> > arbitary/random.
> >
> > For now, I think we will have to either deal with the performance issues
> or
> > switch to using Lucene for the indexing, but we can't do that yet until
> we
> > have the ability to query the list of terms for a given key (which is a
> > necessary function in our domain model).  We could perhaps keep a list of
> > "terms" as nodes *and* index them, but that seems redundant.
> >
> > Ultimately, I think the solution is to hide the complexity via the
> indexing
> > framework and to offer a variety of in-graph indexing models that address
> > specific types of domain requirements.
> >
> > Rick
> >
> > ________________________________________
> > From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
> > Behalf Of Michael Hunger [michael.hun...@neotechnology.com]
> > Sent: Monday, May 02, 2011 3:49 AM
> > To: Neo4j user discussions
> > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> >
> > Perhaps then it is sensible to introduce a second layer of nodes, so that
> > you split down your "supernodes" and distribute the write contention?
> >
> > Would be interesting if putting a round robin on that second level of
> color
> > nodes would be enough to spread lock contention?
> >
> > This is what peter talks about in his activity stream update scenario.
> >
> > And in general perhaps a step to a more performant in-graph index.
> >
> > When thinking about in-graph indexes I thought it might perhaps be
> > interesting to re-use the HashMap approach of declaring x (2^n)
> bucket-nodes
> > then having from the index-root node relationships with the
> (re-distributed)
> > hashcode & (x-1) relationship-types to the bucket nodes and below the
> bucket
> > node rels with the concrete value as an relationship attribute to the
> > concrete nodes.
> >
> > I think this will be addressed even better with Craig's indexes or the
> > Collection abstractions that Andreas Kollegger is working on.
> >
> > Cheers
> >
> > Michael
> >
> > Am 02.05.2011 um 12:16 schrieb Rick Bullotta:
> >
> > > Hi, Niels.
> > >
> > > That's what we're doing now, but it has performance issues with large
> #'s
> > of relationships when "cars" are constantly being added, since the
> "color"
> > nodes become synchronization bottlenecks for updates.
> > >
> > > Rick
> > >
> > > ________________________________________
> > > From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
> > Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com]
> > > Sent: Sunday, May 01, 2011 9:41 AM
> > > To: user@lists.neo4j.org
> > > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > >
> > > One option would be to create a unique value node for each distinct
> color
> > and create a relationship from car to that value node. The value nodes
> can
> > be grouped together with relationships to some reference node.
> > >
> > > This gives the opportunity of finding all distinct colors, and it
> allows
> > you to find all cars with that particular color.
> > >> Date: Sun, 1 May 2011 14:41:40 +0200
> > >> From: matt...@neotechnology.com
> > >> To: user@lists.neo4j.org
> > >> Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > >>
> > >> 2011/4/26 Rick Bullotta <rick.bullo...@thingworx.com>:
> > >>> Hi, Mattias.
> > >>>
> > >>> Here's a use case:
> > >>>
> > >>> I have a million nodes representing cars, and those nodes are all
> > "tagged" with some value, let's say a color name, as a property.  I have
> > indexed those nodes on the color property value.  Now I'd like to present
> a
> > list of the distinct color values with which nodes (cars) have been
> tagged.
> >  At present, I'd need to iterate through all million, read the property,
> and
> > maintain a "distinct" HashSet as I iterate through them.
> > >>>
> > >>> I've tried using relationships from the "car" node(s) to a set of
> > "color" node(s), but had scalability/performance issues when there are
> lots
> > of car nodes being added/deleted (the "color" node quickly becomes a hot
> > spot/synchronization choke point).
> > >>
> > >> Allright, yeah such nodes can become bottlenecks, so I see your
> > >> problem for sure.
> > >>>
> > >>> Rick
> > >>>
> > >>>
> > >>> -----Original Message-----
> > >>> From: user-boun...@lists.neo4j.org [mailto:
> > user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson
> > >>> Sent: Tuesday, April 26, 2011 2:17 PM
> > >>> To: Neo4j user discussions
> > >>> Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > >>>
> > >>> Hi Rick,
> > >>>
> > >>> No, not really. What the use case for having such a method?
> > >>>
> > >>> 2011/4/26 Rick Bullotta <rick.bullo...@thingworx.com>:
> > >>>> Hi, all.
> > >>>>
> > >>>> Is there a method or suggested approach for obtaining a list of all
> of
> > the distinct key values in a given index?  I don't care about the indexed
> > nodes or relationships themselves, just the value(s) of the key.
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> Rick
> > >>>>
> > >>>> _______________________________________________
> > >>>> Neo4j mailing list
> > >>>> User@lists.neo4j.org
> > >>>> https://lists.neo4j.org/mailman/listinfo/user
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Mattias Persson, [matt...@neotechnology.com]
> > >>> Hacker, Neo Technology
> > >>> www.neotechnology.com
> > >>> _______________________________________________
> > >>> Neo4j mailing list
> > >>> User@lists.neo4j.org
> > >>> https://lists.neo4j.org/mailman/listinfo/user
> > >>> _______________________________________________
> > >>> Neo4j mailing list
> > >>> User@lists.neo4j.org
> > >>> https://lists.neo4j.org/mailman/listinfo/user
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Mattias Persson, [matt...@neotechnology.com]
> > >> Hacker, Neo Technology
> > >> www.neotechnology.com
> > >> _______________________________________________
> > >> Neo4j mailing list
> > >> User@lists.neo4j.org
> > >> https://lists.neo4j.org/mailman/listinfo/user
> > >
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> >
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Lucene/Neo Indexing Question

Reply via email to