Re: [Neo4j] REST results pagination

Georg Summer Thu, 21 Apr 2011 06:17:25 -0700

Legacy application that just uses a new data source. It can be quite hard to
get users away from their trusty old-chap-UI.  In the case of Pagination,
Legacy might only mean some years but still legacy :-).


@1-2) In the wake of mobile applications and mobile sites a pagination
system might be more relevant than bulk loading everything and displaying
it. defining smart filters might be problematic in such a use case as well.

Parallelism of an application could also be a interesting aspect. Each
worker retrieves the different pages of the graph and the user does not have
to care at all about separating the graph after downloading it. This would
only be interesting though if the graph relations are not important.

Georg

On 21 April 2011 14:59, Rick Bullotta <rick.bullo...@thingworx.com> wrote:

> Fwiw, I think paging is an outdated "crutch", for a few reasons:
>
> 1) bandwidth and browser processing/parsing are largely non issues,
> although they used to be
>
> 2) human users rarely have the patience (and usability sucks) to go beyond
> 2-4 pages of information.  It is far better to allow incrementally refined
> filters and searches to get to a workable subset of data.
>
> 3) machine users could care less about paging
>
> 4) when doing visualization of a large dataset, you generally want the
> whole dataset, not a page of it, so that's another "non use case"
>
> Discuss and debate please!
>
> Rick
>
>
>
> ----- Reply message -----
> From: "Craig Taverner" <cr...@amanzi.com>
> Date: Thu, Apr 21, 2011 8:52 am
> Subject: [Neo4j] REST results pagination
> To: "Neo4j user discussions" <user@lists.neo4j.org>
>
> >
> > I assume this:
> >    Traverser x = Traversal.description().traverse( someNode );
> >    x.nodes();
> >    x.nodes(); // Not necessarily in the same order as previous call.
> >
> > If that assumption is false or there is some workaround, then I agree
> that
> > this is a valid approach, and a good efficient alternative when sorting
> is
> > not relevant. Glancing at the code in TraverserImpl though, it really
> looks
> > like the call to .nodes  will re-run the traversal, and I thought that
> > would
> > mean the two calls can yield results in different order?
> >
>
> OK. My assumptions were different. I assume that while the order is not
> easily predictable, it is reproducable as long as the underlying graph has
> not changed. If the graph changes, then the order can change also. But I
> think this is true of a relational database also, is it not?
>
> So, obviously pagination is expected (by me at least) to give page X as it
> is at the time of the request for page X, not at the time of the request
> for
> page 1.
>
> But my assumptions could be incorrect too...
>
> I understand, and completely agree. My problem with the approach is that I
> > think its harder than it looks at first glance.
> >
>
> I guess I cannot argue that point. My original email said I did not know if
> this idea had been solved yet. Since some of the key people involved in
> this
> have not chipped into this discussion, either we are reasonably correct in
> our ideas, or so wrong that they don't know where to begin correcting us
> ;-)
>
> This is what makes me push for the sorted approach - relational databases
> > are doing this. I don't know how they do it, but they are, and we should
> be
> > at least as good.
> >
>
> Absolutely. We should be as good. Relational database manage to serve a
> page
> deep down the list quite fast. I must believe if they had to complete the
> traversal, sort the results and extract the page on every single page
> request, they could not be so fast. I think my ideas for the traversal are
> 'supposed' to be performance enhancements, and that is why I like them ;-)
>
> I agree the issue of what should be indexed to optimize sorting is a
> > domain-specific problem, but I think that is how relational databases
> treat
> > it as well. If you want sorting to be fast, you have to tell them to
> index
> > the field you will be sorting on. The only difference contra having the
> > user
> > put the sorting index in the graph is that relational databases will
> handle
> > the indexing for you, saving you a *ton* of work, and I think we should
> > too.
> >
>
> Yes. I was discussing automatic indexing with Mattias recently. I think
> (and
> hope I am right), that once we move to automatic indexes, then it will be
> possible to put external indexes (a'la lucene) and graph indexes (like the
> ones I favour) behind the same API. In this case perhaps the database will
> more easily be able to make the right optimized decisions, and use the
> index
> for providing sorted results fast and with low memory footprint where
> possible, based on the existance or non-existance of the necessary indices.
> Then all the developer needs to do to make things really fast is put in the
> right index. For some data, that would be lucene and for others it would be
> a graph index. If we get to this point, I think we will have closed a key
> usability gap with relational databases.
>
> There are cases where you need to add this sort of meta data to your domain
> > model, where the sorting logic is too complex, and you see that in
> > relational dbs as well, where people create lookup tables for various
> > things. There are for sure valid uses for that too, but the generic
> > approach
> > I believe covers the *vast* majority of the common use cases.
> >
>
> Perhaps. But I'm not sure the two extremes are as lop-sided as you think. I
> think large data users are very interested in Neo4j.
>
> I agree, this is important. I'd like to change "the need for pagination on
> > very large result sets" to "the ability to return very large result sets
> > over the wire". That opens up the debate to solutions like http
> streaming,
> > which do not have the problems that come with keeping state on the server
> > between calls.
> >
>
> I think there are two separate, but related, problems to solve. One is the
> transfer of large result-sets over the wire for people that need that. The
> other is efficiently providing the small page of results from a large
> dataset. Most of our discussion has so far focused on the latter.
>
> For the former, I did a bit of experimenting last year and was able to
> compact my JSON by several times by moving all meta-data into a header
> section. This works very well for data that has a repeating structure, for
> example a large number of records with similar schema. I know schema is a
> nasty word in the nosql world, but it is certainly common for data to have
> a
> repeating pattern, especially when dealing with very large numbers. Then
> you
> find that something like CSV is actually an efficient format, since the
> bulk
> of the text is only the data. We did this in JSON by simply specifying a
> meta-data element (with the headers) and then a contents section with a
> long
> array of values. It worked very well indeed, even though we have a
> half-dozen different 'schema's in the document, it was still much more
> efficient than specifying the meaning of every field as usually done in
> JSON
> or XML.
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] REST results pagination

Reply via email to