Hi Marc, 2011/4/19 Marc Seeger <m...@marc-seeger.de>: > Hey, > I'm currently thinking about how my current data (in mysql + solr) > would fit into Neo4j. > > In one of my "documents", there are the 3 types of data I have: > > 1. Properties that have high cardinality: e.g. the domain name > ("www.example.org", unique), the subdomain name ("www."), the > host-name ("example") > 2. A bunch of numbers (the website latency (1244ms), the amount of > incoming links (e.g. 2321)) > 3. A number of 'tags' that have a relatively low cardinality (<100). > Things like the webserver ("apache"), the country ("germany") > > As for the model, I think it would be something like this: > - Every domain gets a node > - #1 would be modeled as a property on the domain node > - #2 would probably be put into a lucene index so I can sort on it later on > - #3 could be modeled using relations. E.g. a node that has two > properties: type:webserver and name:apache. All of the "domain"-nodes > can have a relation called "runs on the webserver" > > Does this make sense? > I am used to Document DBs, relational DBs and Column Stores, but Graph > DBs are still pretty new to me and I don't think I got the model 100% > :) > > Using this model, would I be able to filter subsets of the data such > as "All Domains that run on apache and are in Germany and have more > than 200 incoming links sorted by the amount of links"?
Even every subdomain and tag could be a node: ("www") <--SUBDOMAIN_OF-- ("example.org") --RUNS_ON--> ("apache") \ ---RUNS_IN--> ("germany") You could then start from the apache or germany node: Node apache = ... Node germany = ... for ( Relationship runsIn : germany.getRelationships( RUNS_IN, INCOMING ) ) { Node domain = runsIn.getStartNode(); if ( apache.equals( domain.getSingleRelationship( RUNS_ON, OUTGOING ) ) { int incomingLinks = (Integer) domain.getProperty( "links" ); if ( incomingLinks < 200 ) // This is a hit, store in a list } } // sort the result list Or the other way around (start from number of links, via a sorted lucene lookup). Sorry for the quite verbose lucene query code: Node apache = ... Node germany = ... Query rangeQuery = NumericRangeQuery.newIntRange( "links", 0, 200, true, false ); QueryContext query = new QueryContext( rangeQuery ).sort( new Sort( new SortField( "links", SortField.LONG ) ) ); for ( Node domain : domainIndex.query( query ) ) { if ( apache.equals( domain.getSingleRelationship( RUNS_ON, OUTGOING ) ) && germany.equals( domain.getSingleRelationship( RUNS_IN, OUTGOING ) ) ) // This is a hit } If performance becomes a problem then I'd guess you'll have to index more fields (links, webserver, country) into the same index so that compound queries can be asked. > I played a bit arround with the neography gem in Ruby and I could do stuff > like: > > germany_nginx = germany_nodel.shortest_path_to(websrv_nginx).depth(2).nodes > > But I couldn't figure out how to "expand" this "query" > > Looking forward to the feedback! > Marc > > > > -- > Pessimists, we're told, look at a glass containing 50% air and 50% > water and see it as half empty. Optimists, in contrast, see it as half > full. Engineers, of course, understand the glass is twice as big as it > needs to be. (Bob Lewis) > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user