I'd like to comment on the question of indexing event logs. We are also dealing with event logs and we query them based on time windows as well as other properties. In our case they are also located on a map, and categorized using event types. So we have three types of indexes in use:
- time index (using long values, similar to the TimelineIndex Johan mentioned, but we coded one ourselves) - spatial index (using two double values, based on the same mechanism for the time index, but in 2D) - category index (we just create a list of category nodes and link the events to the relevant category) All of these indexes are simply nodes that the event stream nodes link to. For numerical indexes like the time and spatial indexes we use tree structures (not B-tree, but usually multiple children per-parent, suitable for uneven density data). The first level of the tree (closest to the data) is chosen with a resolution close to common queries (in our case the events occur many times per second, but queries are usually at second resolution, so the first index is of second resolution). You had a very important question about combined indexes, for example querying on timestamp and category in the same query with high performance. Currently we do not have need for that in our system, but we have brainstormed a nice solution to this, so I thought I'd mention it here in case it is useful. There are two options: - If one of the criteria is very limiting all the time, for example querying on time-window always returns a small set, then query that first and do a slow search for the linked categories. This adds no additional complexity to the database, but makes assumptions about the queries, and only performs well if these assumptions are true. - Otherwise you can build an combined index by connecting the tree nodes of one index to the tree nodes of another. In the case of time and category indices, each of the nodes in the B-tree or multi-tree time index would be connected to all the categories for which its underlying data nodes belong. Then when traversing the time tree, you can test for both the time-window constraints and the category constraints, and exit the search if either fail. We have considered the possibility of building these structures on demand, based on actual queries, so the first query that works with any two constraints would search on one, and then build the combined index for both. This allows subsequent searches to run very fast, without needing to build all possible combinations of combined index (assuming many single property indices exist). - One aspect of our application will store nodes that can be > considered similar to event logs. There may be many thousands of these > nodes per "event stream". We would like to be able to traverse the entries > in chronological order, very quickly. We were considering the following > design possibilities: > > o Simply create a node for each "stream" and a node for each entry, with > a > relationship between the stream and the entry, then implement our own sort > routine > Our approach is to create a node for each entry, and index using time and spatial indices. The first level of index is another stream of data, ordered by the relevant property, and traversable in that order (eg. time order). o Similar to the above, but create a node for each "day", and manage > relationships to allow traversal by stream and/or day > In our approach, each level in the index tree represents a higher level of granularity. We go up in fixed steps (multiples). A B-tree steps 2X. We tend to step 10X, because that gives isosceles pyramid trees. But you might prefer to step in known temporal quantities, seconds, minutes, hours, days, weeks, months, etc. That will improve search performance if your common queries are exact multiples of the different index levels. > o Create a node for each stream, a node for each entry and treat the > entries as a forward-only linked list using relationships between the > entries (and of course a relationship between the stream and the "first" > entry) > We tend to create relationships for all common query or traversal paths, with different relationship types in all cases. So traversing the original data would use 'next'. Traversing down the index would be 'child', or perhaps 'index-child' if there is ambiguity. etc. - Anyone used any kind of intermediate index or other approach to > bridge multiple Neo instances? > Hmm... I think it was this question that got me started on the combined index discussion above, but now that I re-read it, I see it has nothing to do with combined indices. I've thought a bit about bridging indices, but have nothing really useful to offer here. Sorry. Hope the long discussion above still has some value :-( _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user