Craig, how do you map _all_ properties to the integer (or rather numeric/long?) space.
Does the mapping then also rely on the alphabetic (or comparision) order of the properties? Interesting approach. Is your space then as n-dimensional as the numbers of properties you have? Cheers Michael Am 02.02.2011 um 00:59 schrieb Craig Taverner: > Here is a crazy idea - how about taking the properties you care about and > dropping them into a combined lucene index? Then all results for nodes with > the same properties would be 'ambiguous'. Moving this forward to degrees of > ambiguity might be possible by creating the combined 'value' using a reduced > resolution of the properties (to increase similarity so the index will see > them as identical). > > Another option is the 'still very much in progress' composite index I > started in December. Since all properties are mapped into normal integer > space, the euclidean distance of the first level index nodes from each other > is a discrete measure of similarity. A distance of zero means that the nodes > attach to the same index node, and are very similar or identical. Higher > values mean greater dissimilarity. This index theoretically supports any > number of properties of any type (including strings) and allows you to plug > in your own value->index mappers, which means you can control what you mean > by 'similar'. > > On Tue, Feb 1, 2011 at 9:52 PM, Ben Sand <b...@bensand.com> wrote: > >> I was working on a project that used matching algorithms a while back. >> >> What you have is an n-dimensional matching problem. I can't remember >> specifically what the last project were using, but this and the linked >> algos >> may be what you're looking for: >> http://en.wikipedia.org/wiki/Mahalanobis_distance >> >> On 2 February 2011 07:34, Tim McNamara <paperl...@timmcnamara.co.nz> >> wrote: >> >>> Say I have two nodes, >>> >>> >>> { "type": "person", "name": "Neo" } >>> { "type": "person", "name": "Neo" } >>> >>> >>> >>> Over time, I learn their locations. They both live in the same city. This >>> increases the chances that they're the same person. However, over time it >>> turns out that their ages differ, therefore it's far less likely that >> they >>> are the same Neo. >>> >>> >>> Is there anything inside of Neo4j that attempts to determine how close >> two >>> nodes are? E.g. to what extent their subtrees and properties match? >>> Additionally, can anyone suggest literature for algorithms for >>> disambiguating the two entities? >>> >>> >>> If I wanted to implement something that searches for similarities, that >>> returns a probability of a match, can I do this within the database or >>> should I implement it within the application? >>> >>> >>> -- >>> Tim McNamara >>> @timClicks >>> http://timmcnamara.co.nz >>> >>> >>> >>> _______________________________________________ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >>> >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user